基于深度学习的潜在抗HIV活性分子生成新方法研究
其他题名
Research on a New Method of Generating Potential Anti-HIV Active Molecules Based on Deep Learning
学位类型
硕士
导师
许存禄
2020-07-19
学位授予单位
兰州大学
学位授予地点
兰州
学位名称
工学硕士
学位专业
电路与系统
关键词
深度学习
中文摘要
艾滋病是对人类危害最大的疾病之一,由感染HIV引起。现阶段在全球范围内仍然缺乏有效治愈艾滋病的方法,抗HIV药物是防治艾滋病最有效的手段之一。HIV具有耐药性,因此需要不断发现新的抗HIV活性分子,以研制更多的抗HIV药物。本文对现有的新型药物设计方法进行改进,并采用两种不同的方法生成潜在抗HIV活性分子,以扩增潜在抗HIV活性分子库。本文为发现新的抗HIV活性分子提供了新思路,主要创新及工作内容包含以下几个方面:
(1)搭建深度分子生成模型DGMM,旨在生成结构有效、新颖且性质无偏的分子。DGMM基于MLSTM、SRU、QRNN三种循环单元进行构造,采用源自ChEMBL的大型分子数据集进行训练。经过训练,基于MLSTM搭建的DGMM取得最优效果,其生成分子的平均有效性为98.31%,唯一性为99.93%,新颖性为89.33%,综合优于现有的化学语言模型。随后将最优DGMM生成的分子与训练集分子进行性质对比,实验结果表明DGMM生成的分子能够还原训练集分子的性质分布,验证了DGMM生成分子的性质无偏性。
(2)搭建深度迁移分子生成模型T-DGMM,旨在生成潜在抗HIV活性分子,扩增潜在抗HIV活性分子库搭建抗HIV活性预测模型AAPM,验证T-DGMM生成分子的潜在抗HIV活性。为了验证迁移学习方法的有效性,T-DGMM基于两种不同规模的抗HIV活性数据集进行训练,最终在基于极小规模数据集训练的T-DGMM生成的分子中检验到已知抗HIV活性的分子。AAPM采用不同深度学习架构进行搭建,训练集规模为正负样本各一万,最终基于DNN的AAPM外部验证集准确率达88.90%。最后基于AAPM预测T-DGMM生成分子的抗HIV活性,其中最高68.29%被判别为抗HIV活性,验证了T-DGMM的有效性。
(3)搭建深度强化分子生成模型R-DGMM,分别进行两个不同的任务。任务一是生成利匹韦林的相似物,最终R-DGMM生成了包含达匹韦林在内的9种抗HIV活性分子。任务二设计了组合评分函数,旨在生成同时具有潜在抗HIV活性、期望合成可及性及类药性的分子,最终R-DGMM生成了2种已知抗HIV活性的分子。两个任务均表明R-DGMM适用于生成潜在抗HIV活性分子。
英文摘要
AIDS is one of the severest diseases to humans, which is caused by HIV infection. Nowadays, there is still no effective way to cure AIDS globally, and anti-HIV drugs are one of the most effective means to prevent and cure AIDS. However, HIV is drug-fast. Therefore, it is necessary to keep developing more anti-HIV drugs through discovering new anti-HIV active molecules. This research improves existing design approaches of new drugs, and adopts two disparate methods to generate potential anti-HIV active molecules to expand the potential anti-HIV active molecule library. This research provides ideas for the discovery of new anti-HIV active molecules. The main innovations and work include the following aspects:
(1) Building a deep molecule generation model DGMM, aiming to generate molecules with effective structures, novelty and unbiased properties. DGMM is constructed based on three recurrent units which are MLSTM, SRU, and QRNN. It uses a large molecular data set derived from ChEMBL for training. After training, the MLSTM-based DGMM achieves the best results. The average validity of its generated molecules is 98.31%, the uniqueness is 99.93%, and the novelty is 89.33%, which is better than the existing chemical language models comprehensively. The properties of the molecules generated by the optimal DGMM and those of the training set compared, the experiment shows that the molecules generated by the DGMM can restore the properties distribution of the molecules of the training set, which verifies the unbiased properties of the molecules generated by the DGMM.
(2) Building a deep transfer molecule generation model T-DGMM to generate potential anti-HIV active molecules and amplify the potential anti-HIV active molecule librarybuilding an anti-HIV activity prediction model AAPM to verify whether the molecules generated by T-DGMM have potential anti-HIV activity. T-DGMM was trained on anti-HIV activity data sets of different scales to verify the effectiveness of the transfer learning method. Finally, molecules with known anti-HIV activity were detected in molecules generated by T-DGMM trained on the data set of extremely small scale. AAPM is built with different deep learning architectures. The scale of the training set consists of 10,000 positive and 10,000 negative samples. The final accuracy of the AAPM based on DNN is 88.90% in external validation set. Finally, on the basis of AAPM, the anti-HIV activity of T-DGMM-generated molecules was predicted, the highest 68.29% of which were judged as anti-HIV activity, verifying the effectiveness of T-DGMM.
(3) Building a deep reinforcement molecule generation model R-DGMM to perform two distinct tasks. The first task is to generate analogues of Rilpivirine, and R-DGMM generated nine anti-HIV active molecules including Dapivirine. The second task designed a combined scoring function, aiming to generate molecules with potential anti-HIV activity, expected synthetic accessibility, and expected drug-likeness. Finally, R-DGMM generated two molecules with known anti-HIV activity. The two tasks indicate that R-DGMM is suitable for generating potential anti-HIV active molecules.
页数
88
URL
查看原文
语种
中文
文献类型
学位论文
条目标识符
https://ir.lzu.edu.cn/handle/262010/466993
Collection
信息科学与工程学院
作者单位
信息科学与工程学院
第一作者单位
信息科学与工程学院
Recommended Citation:
GB/T 7714
曲晋慷. 基于深度学习的潜在抗HIV活性分子生成新方法研究[D]. 兰州. 兰州大学,2020.
???home.navigation.home???
???jsp.flex.map.communities.pic???
???home.navigation.Included???
???jsp.flex.map.citation???
Authors
Subjects
???jsp.footer.usinghelp???
Administrator Email
???jsp.site.statistics.item???
215520
???jsp.site.statistics.full???
34140
???jsp.site.statistics.views???
10495823
???jsp.site.statistics.downloads???
5823
Files in This Item:
|
|
There are no files associated with this item.
|