Classifying epitopes is essential since they can be applied in various fields, including therapeutics, diagnostics and peptide-based vaccines. To determine the epitope or peptide against an antibody, epitope mapping with peptides is the most extensively used method. However, this method is more time-consuming and inefficient than using present methods. The ability to retrieve data on protein sequences through laboratory procedures has led to the development of computational models that predict epitope binding based on machine learning and deep learning (DL). It has also evolved to become a crucial part of developing effective cancer immunotherapies. This paper proposes an architecture to generalize this case since various research strives to solve a low-performance classification problem. A proposed DL model is the fusion architecture, which combines two architectures: Transformer architecture and convolutional neural network (CNN), called MITNet and MITNet-Fusion. Combining these two architectures enriches feature space to correlate epitope labels with the binary classification method. The selected epitope–T-cell receptor (TCR) interactions are GILG, GLCT and NLVP, acquired from three databases: IEDB, VDJdb and McPAS-TCR. The previous input data was extracted using amino acid composition, dipeptide composition, spectrum descriptor and the combination of all those features called AADIP composition to encode the input data to DL architecture. For ensuring consistency, fivefold cross-validations were performed using the area under curve metric. Results showed that GILG, GLCT and NLVP received scores of 0.85, 0.87 and 0.86, respectively. Those results were compared to prior architecture and outperformed other similar deep learning models.
中文翻译:
对表位进行分类是必不可少的,因为它们可以应用于各个领域,包括治疗学、诊断学和基于肽的疫苗。为了确定针对抗体的表位或肽,使用肽进行表位作图是最广泛使用的方法。然而,这种方法比使用现有方法更耗时且效率低下。通过实验室程序检索蛋白质序列数据的能力导致了基于机器学习和深度学习 (DL) 预测表位结合的计算模型的开发。它还已发展成为开发有效的癌症免疫疗法的重要组成部分。由于各种研究都在努力解决低性能分类问题,因此本文提出了一种架构来概括这种情况。提议的 DL 模型是融合架构,它结合了两种架构:Transformer 架构和卷积神经网络 (CNN),称为 MITNet 和 MITNet-Fusion。结合这两种架构丰富了特征空间,将表位标签与二元分类方法相关联。选定的表位-T 细胞受体 (TCR) 相互作用是 GILG、GLCT 和 NLVP,从三个数据库获得:IEDB、VDJdb 和 McPAS-TCR。先前的输入数据是使用氨基酸组成、二肽组成、光谱描述符以及所有这些称为 AADIP 组成的特征的组合来提取的,以将输入数据编码为 DL 架构。为了确保一致性,使用曲线下面积度量进行了五重交叉验证。结果显示,GILG、GLCT 和 NLVP 的得分分别为 0.85、0.87 和 0.86。