THPLM: a sequence-based deep learning framework for protein stability changes prediction upon point variations using pretrained protein language model

Gong, Jianting
Jiang, Lili
Chen, Yongbing
Zhang, Yixiang
Li, Xue
Ma, Zhiqiang
Fu, Zhiguo
He, Fei
Sun, Pingping
Ren, Zilin
Tian, Mingyao

¹School of Information Science and Technology, Institution of Computational Biology, Northeast Normal University, Changchun 130117, China
²Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun 130122, China
³Department of Computer Science, College of Humanities and Sciences of Northeast Normal University, Changchun 130117, China

Associate Edito: Xin Gao

^*Corresponding author. School of Information Science and Technology, Institution of Computational Biology, Northeast Normal University, Changchun 130117, China. E-mails: [email protected] (P.S.); [email protected] (Z.R.); [email protected] (M.T.)

Received: April 29, 2023; Revised: September 25, 2023; Editorial Decision: October 9, 2023; Accepted: October 22, 2023; Advance Access Publication Date: November 1, 2023

Bioinformatics 39(11):p btad646, November 2023. | DOI: 10.1093/bioinformatics/btad646

Abstract

Motivation:

Quantitative determination of protein thermodynamic stability is a critical step in protein and drug design. Reliable prediction of protein stability changes caused by point variations contributes to developing-related fields. Over the past decades, dozens of structure-based and sequence-based methods have been proposed, showing good prediction performance. Despite the impressive progress, it is necessary to explore wild-type and variant protein representations to address the problem of how to represent the protein stability change in view of global sequence. With the development of structure prediction using learning-based methods, protein language models (PLMs) have shown accurate and high-quality predictions of protein structure. Because PLM captures the atomic-level structural information, it can help to understand how single-point variations cause functional changes.

Results:

Here, we proposed THPLM, a sequence-based deep learning model for stability change prediction using Meta's ESM-2. With ESM-2 and a simple convolutional neural network, THPLM achieved comparable or even better performance than most methods, including sequence-based and structure-based methods. Furthermore, the experimental results indicate that the PLM's ability to generate representations of sequence can effectively improve the ability of protein function prediction.

Availability and implementation:

The source code of THPLM and the testing data can be accessible through the following links: https://github.com/FPPGroup/THPLM.