相关文章推荐
调皮的石榴  ·  Java SQLException: ...·  1 年前    · 
有腹肌的香烟  ·  javascript - How do I ...·  1 年前    · 
分子指纹是用于虚拟筛选和绘制化学空间图的必不可少的化学信息学工具。在不同类型的指纹中,子结构指纹最适合小分子(例如药物),而原子对指纹更适合大分子(例如肽)。但是,没有可用的指纹在这两类分子上都无法实现良好的性能。在这里,我们着手结合子结构和原子对概念,设计一种适用于大分子和小分子的新指纹。我们的探索产生了一个新的指纹,称为MinHashed原子对指纹,最大直径为4个键(MAP4)。在此指纹中,将原子对中每个原子周围半径r = 1和r = 2键的圆形子结构写为两对SMILES,每对都与分隔两个中心原子的拓扑距离结合在一起。这些所谓的原子对分子瓦被散列,并且产生的一组散列被最小化以形成MAP4指纹。在将Riniker和Landrum小分子基准与肽基准相结合的扩展基准上,MAP4的性能明显优于所有其他指纹,而肽基准可从乱序或点突变类似物中回收BLAST类似物。MAP4还为药物数据库,ChEMBL,SwissProt和人类代谢组数据库(HMBD)等数据库生成了组织良好的化学空间树图(TMAP),并区分了HMBD中的所有代谢物,其中超过70%与它们的代谢物没有区别使用子结构指纹的最近邻居。MAP4是适用于药物,生物分子,以及代谢组,可以用作描述和搜索化学空间的通用指纹。源代码位于https://github.com/reymond-group/map4,交互式MAP4相似性搜索工具和各种数据库的TMAP可从http://map-search.gdb.tools/和http://访问。 tm.gdb.tools/map4/。 Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecules such as peptides. However, no available fingerprint achieves good performance on both classes of molecules. Here we set out to design a new fingerprint suitable for both small and large molecules by combining substructure and atom-pair concepts. Our quest resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a diameter of four bonds (MAP4). In this fingerprint the circular substructures with radii of r = 1 and r = 2 bonds around each atom in an atom-pair are written as two pairs of SMILES, each pair being combined with the topological distance separating the two central atoms. These so-called atom-pair molecular shingles are hashed, and the resulting set of hashes is MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all other fingerprints on an extended benchmark that combines the Riniker and Landrum small molecule benchmark with a peptide benchmark recovering BLAST analogs from either scrambled or point mutation analogs. MAP4 furthermore produces well-organized chemical space tree-maps (TMAPs) for databases as diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database (HMBD), and differentiates between all metabolites in HMBD, over 70% of which are indistinguishable from their nearest neighbor using substructure fingerprints. MAP4 is a new molecular fingerprint suitable for drugs, biomolecules, and the metabolome and can be adopted as a universal fingerprint to describe and search chemical space. The source code is available at https://github.com/reymond-group/map4 and interactive MAP4 similarity search tools and TMAPs for various databases are accessible at http://map-search.gdb.tools/ and http://tm.gdb.tools/map4/.