THPLM: a sequence-based deep learning framework for protein stability changes prediction upon point variations using pretrained protein language model
- Gong, Jianting
- Jiang, Lili
- Chen, Yongbing
- Zhang, Yixiang
- Li, Xue
- Ma, Zhiqiang
- Fu, Zhiguo
- He, Fei
- Sun, Pingping
- Ren, Zilin
- Tian, Mingyao
Abstract
Motivation:
Quantitative determination of protein thermodynamic stability is a critical step in protein and drug design. Reliable prediction of protein stability changes caused by point variations contributes to developing-related fields. Over the past decades, dozens of structure-based and sequence-based methods have been proposed, showing good prediction performance. Despite the impressive progress, it is necessary to explore wild-type and variant protein representations to address the problem of how to represent the protein stability change in view of global sequence. With the development of structure prediction using learning-based methods, protein language models (PLMs) have shown accurate and high-quality predictions of protein structure. Because PLM captures the atomic-level structural information, it can help to understand how single-point variations cause functional changes.
Results:
Here, we proposed THPLM, a sequence-based deep learning model for stability change prediction using Meta's ESM-2. With ESM-2 and a simple convolutional neural network, THPLM achieved comparable or even better performance than most methods, including sequence-based and structure-based methods. Furthermore, the experimental results indicate that the PLM's ability to generate representations of sequence can effectively improve the ability of protein function prediction.
Availability and implementation:
The source code of THPLM and the testing data can be accessible through the following links: https://github.com/FPPGroup/THPLM.