相关文章推荐
风度翩翩的可乐  ·  keyerror: 'unnamed: ...·  3 月前    · 
强悍的茄子  ·  Visual Studio 15 ...·  1 年前    · 
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely. As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsement of, or agreement with, the contents by NLM or the National Institutes of Health. Learn more about our disclaimer.

An external file that holds a picture, illustration, etc. Object name is gkp842f1.jpg

Overview of the binding site annotation procedure in IBIS.

A binding site cluster represents a collection of structures which are related to the query, and where all members of the cluster contain similar overlapping binding sites when mapped onto the query. Similarity between binding sites is measured in terms of sequence similarity, and those positions which overlap structurally are assigned an additional weight. Binding sites are clustered by a hierarchical complete linkage clustering procedure. To decide on the cutoff for clustering, we use a recently described energy function which maximizes the mean similarity of members within a cluster and minimizes the complexity of the description provided by cluster membership (number of bits required to describe the data) ( 31 ). Clusters which contain an actual interaction observed in the query structure are marked by the letter ‘O’. By expanding the cluster one can see additional information about its members.

All binding site clusters are ranked in terms of their predicted biological relevance and similarity to the query. The components of the ranking score are the sequence-PSSM score; the average sequence identity between the query and cluster members calculated over the whole structure–structure alignment; the number of interfacial contacts and the average sequence conservation of binding site alignment columns. All components of the ranking score are then normalized and all clusters are ranked with respect to the Z -scores.

Evaluating biological relevance of binding sites

To emphasize biologically relevant binding sites we validate sites according to a few criteria. First, we assess the evolutionary conservation of binding site clusters. Those sites which reoccur in diverse enough protein complexes are ranked higher, an idea which was previously implemented in the Conserved Binding Modes (CBM) database ( 32 ). Clusters that have only one non-redundant member (after members with >90% identity are purged) are considered ‘singletons’ and are displayed at the bottom of the interaction summary table with a low rank. Another way to evaluate binding sites is to compare them with manually curated site annotations from the Conserved Domain Database (CDD), which have been extracted from the published literature or derived from manual interpretation of individual three-dimensional structures ( 23 ). Binding site clusters which overlap by >50% with a CDD annotation are ranked first. For protein–chemical interactions, we exclude by default chemicals such as buffers, salts, detergents, solvents and ions that are typically added for the purpose of crystallization and/or purification. Most often, these are not relevant with respect to the protein’s biological function. Finally, we employ the PISA algorithm ( 24 ) to validate protein–protein interaction interfaces and eliminate those interfaces which appear to be the result of crystal packing.

RESULTS AND DISCUSSION

Summary statistics of the IBIS database

Currently, a total of 40 716 proteins (151 887 protein chains/domains) are represented in IBIS with at least one type of interaction observed in their structural complexes. As can be seen from Figure 2 , protein–protein and protein–chemical interactions are the most frequent types of interactions observed in protein structures. Protein–protein interactions are the most prevalent interactions as reflected by the number of domains involved in interactions and the number of binding sites. The number of inferred interactions is always higher than the number of observed interactions, especially for protein–peptide and protein–nucleic acid interactions, where the number of inferred interactions exceeds the number of observed ones (in terms of the number of protein chains) almost 5-fold. This ratio is even higher for binding site clusters ( Figure 2 B). Altogether, IBIS provides information on binding partners and binding site locations with averages of 3.4 protein–chemical binding site clusters per chain, and eight protein–protein binding site clusters per domain. The scale of such annotations is approaching the scale of whole interactomes.

( A ) Histogram depicting the number of proteins in PDB with observed/inferred binding sites. ( B ) Histogram showing the number of binding sites inferred by IBIS as compared to those observed in protein structure complexes.

Description of the IBIS interface

IBIS may be queried by supplying either a protein NCBI GenBank identifier or PDB code (the one letter PDB chain identifier is optional). For a given query, it is possible to see different types of interactions, protein–protein, protein–chemical, protein–DNA, protein–RNA and protein–peptide, by navigating through different tabs at the top of the page (the display of protein-ion interactions is currently under development). Figure 3 illustrates an IBIS Interaction Summary page. Observed and inferred binding site clusters are sorted by the ranking score. Each row in the table corresponds to a binding site cluster and can be expanded to show the cluster members.

IBIS screen shot for 1U59, Chain A, displaying various chemical binding sites inferred from its homologs. A blowup of the expanded cluster of the ATP binding site is also shown.

The main features of binding sites and interaction partners in the Interaction Summary table are as follows:

‘Interaction partner’ —name of the interaction partner which interacts with either the actual query (‘observed’ interactions) or homologs of the query from within a given binding site cluster (‘inferred’ interactions). For protein–protein interactions, the CDD domain name of the binding partner is listed. For protein–chemical interactions, the column reports the name of the chemical bound to a representative member of the cluster. For protein–nucleic acid and protein–peptide interactions, the column reports the sequence of the first 20 biopolymer residues from the interaction partner of a representative cluster member.

‘Ranking score’— the score which ranks the binding site clusters in terms of their biological relevance and similarity to the query. The ranking score is not defined for the ‘singleton’ clusters.

‘Number of cluster members’ —the number of cluster members. Upon cluster expansion only non-redundant cluster members are displayed (at <90% identity level). A complete list of members can also be viewed by clicking the ‘See all members’ link.

‘Average percent identity to query’ —the average sequence identity between the query and the cluster members calculated over all of their structural alignments with the query.

‘Number of binding site residues’ —the union of binding sites mapped from all members of the cluster to the query.

‘Number of chemicals’ (for protein–chemical interactions)—the number of unique, standardized chemicals present in a given binding site cluster.

‘Curator annotation’— binding site annotation from the CDD which overlaps by >50% with the sites annotated by IBIS. Binding site clusters with matching CDD annotation are top-ranked irrespective of their ranking score.

‘Taxonomic diversity’— the last common ancestor of the proteins from a given cluster, listed with a link to NCBI’s Taxonomy Browser, so that one can explore all taxonomic groups represented by the cluster.

The actual binding site residue alignment can be seen upon expanding the clusters, including the PDB codes corresponding to all complex structures summarized by the clusters. It is also possible to view the inferred binding sites projected onto the actual query structure using the Cn3D visualization software ( http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml ). For the case of protein–protein interactions, the expanded table will provide the PISA validation status for each interaction interface. PISA may not be able to process a particular complex structure; these cases are indicated by an ‘N/A’ symbol.

The features of binding site clusters can be examined by using the ‘Advanced search’ option found on the left side bar. This option allows one to filter the interactions within a given interaction type by various criteria like level of sequence identity, structural similarity, names of interacting partner and others. In the case of chemical binding sites, for example, it is possible to pick and inspect various sites a particular chemical may bind to on a given query.

Annotating new binding sites using IBIS: example of human spleen tyrosine kinase catalytic domain

Spleen tyrosine kinase (Syk) is a non-receptor tyrosine kinase, expressed in a wide range of cell types, which plays an important role in immunoreceptor signaling ( 33 ). It is an attractive drug target for the treatment of allergic and antibody mediated autoimmune diseases, breast and gastric cancers. Syk is characterized by two N-terminal SH2 adapter domains, a linker region and a C-terminal catalytic domain. Several drugs/inhibitors target the active site of the Syk catalytic domain and decrease its activity.

Here, we demonstrate how IBIS can be used to annotate the binding sites of the Syk catalytic domain. We start with a Syk sequence for which a structure of the complex with the ligands is available (pdb code: 1XBB); we predict binding sites using IBIS, and finally compare predicted sites with the actual binding sites observed in the structure. First we find the closest homolog with a known structure, a Zap-70 kinase (1U59 Chain A; Blast E -value of 6e-99 and 77% identity to the query sequence, Figure 2 ). Second, we use the structure of 1U59 as a query in IBIS and find nine protein–chemical binding site clusters. The top two clusters overlap with the ‘active site/ATP binding site’ CDD annotations. The first binding site cluster includes 360 homologous structures bound to 170 different chemicals. The consensus binding site alignment is 65 residues long, due to the diversity and size variation of the chemicals bound, but it highlights 13 highly conserved residues. The ATP-binding site represents an attractive target for the design of kinase inhibitors, and IBIS provides a concise summary of interactions at that site, which would otherwise require significant comparative analysis. Here IBIS groups and identifies an ATP-binding site, and provides a list of various chemicals, among them many kinase inhibitors, which might potentially bind to and inhibit the query protein. All binding sites observed in the actual structure complex with the anticancer drug imatinib (1XBB) are correctly annotated by IBIS (see table in Figure 4 ). Interestingly, imatinib binds not only to the ATP-binding site but also to a regulatory myristoylation site on the C-terminus (from the binding site cluster #8) that can be annotated on the query sequence.

An external file that holds a picture, illustration, etc. Object name is gkp842f4.jpg

Mapping of the 1U59 inferred ATP binding site onto the sequence of Syk tyrosine kinase (1XBB chain A) and its agreement with the observed binding site in Syk + complex with imatinib. MMDB residue numbering is used which starts from the beginning of the corresponding GenBank protein sequence.

In addition to chemical binding sites, it is also possible to predict protein interaction partners for the Syk protein. For example, binding site cluster #1 under protein–protein interactions points to a potential SH2 domain binding site which is further validated by CDD curator annotation, although no structural complexes have been solved between Syk and SH2.

DISCUSSION

In this paper, we presented a comprehensive, web-accessible database, which organizes, analyzes and predicts different types of interaction partners and binding sites in proteins. For proteins with or without known binding partners, IBIS provides a succinct and informative representation of observed binding sites and binding sites inferred from homologs with known 3D structure. It provides analysis of how well a binding site is conserved across members of a homologous protein family. Several structures of the same protein or close homologs with different binding partners may be available in the Protein Data Bank, or the same protein may have been crystallized under different physiological conditions. In such cases, the IBIS database facilitates a detailed classification and analysis of binding sites. IBIS also attempts to validate binding sites by assessing their biological relevance and ranks them accordingly. It can be used to annotate oligomeric states by inferring relevant homo-oligomer interfaces and should prove useful in studying the evolution of protein interactions.

IBIS is updated regularly (currently on a biweekly schedule) to account for the growth of the GenBank, PDB/MMDB, VAST and CDD databases. Recently, it was estimated that almost half of all sequences in the GenBank database have at least one structure homolog with an extensive alignment and at least 30% identical residues ( 34 ). As the on-going structural genomics initiative continues to close the sequence-structure gap, IBIS serves as a powerful knowledge-based annotation system for proteins of unknown structure.

FUNDING

National Institutes of Health/DHHS (Intramural Research program of the National Library of Medicine). Funding for open access charge: National Institutes of Health/DHHS (Intramural Research program of the National Library of Medicine).

Conflict of interest statement . None declared.

ACKNOWLEDGEMENTS

The authors would like to thank Yanli Wang and Lewis Geer for useful discussions and Eugene Krissinel for help with the PISA software.

REFERENCES

1. Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, et al. A protein interaction map of Drosophila melanogaster. Science. 2003; 302 :1727–1736. [ PubMed ] [ Google Scholar ]
2. Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, et al. A map of the interactome network of the metazoan C. elegans. Science. 2004; 303 :540–543. [ PMC free article ] [ PubMed ] [ Google Scholar ]
3. Bork P, Koonin EV. Predicting functions from protein sequences—where are the bottlenecks? Nat. Genet. 1998; 18 :313–318. [ PubMed ] [ Google Scholar ]
4. Rentzsch R, Orengo CA. Protein function prediction—the power of multiplicity. Trends Biotechnol. 2009; 27 :210–219. [ PubMed ] [ Google Scholar ]
5. Matthews LR, Vaglio P, Reboul J, Ge H, Davis BP, Garrels J, Vincent S, Vidal M. Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or ‘interologs’ Genome Res. 2001; 11 :2120–2126. [ PMC free article ] [ PubMed ] [ Google Scholar ]
6. Gerlt JA, Babbitt PC. Can sequence determine function? Genome Biol. 2000; 1 REVIEWS0005. [ PMC free article ] [ PubMed ] [ Google Scholar ]
7. Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res. 2004; 14 :1107–1118. [ PMC free article ] [ PubMed ] [ Google Scholar ]
8. Hegyi H, Gerstein M. The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 1999; 288 :147–164. [ PubMed ] [ Google Scholar ]
9. Campbell SJ, Gold ND, Jackson RM, Westhead DR. Ligand binding: functional site location, similarity and docking. Curr. Opin. Struct. Biol. 2003; 13 :389–395. [ PubMed ] [ Google Scholar ]
10. Jones S, Thornton JM. Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol. 1997; 272 :121–232. [ PubMed ] [ Google Scholar ]
11. Teichmann SA, Murzin AG, Chothia C. Determination of protein function, evolution and interactions by structural genomics. Curr. Opin. Struct. Biol. 2001; 11 :354–363. [ PubMed ] [ Google Scholar ]
12. Landgraf R, Xenarios I, Eisenberg D. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J. Mol. Biol. 2001; 307 :1487–1502. [ PubMed ] [ Google Scholar ]
13. Pazos F, Sternberg MJ. Automated prediction of protein function and detection of functional sites from structure. Proc. Natl Acad. Sci. USA. 2004; 101 :14754–14759. [ PMC free article ] [ PubMed ] [ Google Scholar ]
14. Brylinski M, Skolnick J. A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc. Natl Acad. Sci. USA. 2008; 105 :129–134. [ PMC free article ] [ PubMed ] [ Google Scholar ]
15. Hernandez M, Ghersi D, Sanchez R. SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res. 2009; 37 :W413–W416. [ PMC free article ] [ PubMed ] [ Google Scholar ]
16. Huang B, Schroeder M. LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol. 2006; 6 :19. [ PMC free article ] [ PubMed ] [ Google Scholar ]
17. Laurie AT, Jackson RM. Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics. 2005; 21 :1908–1916. [ PubMed ] [ Google Scholar ]
18. Qin S, Zhou HX. meta-PPISP: a meta web server for protein-protein interaction site prediction. Bioinformatics. 2007; 23 :3386–3387. [ PubMed ] [ Google Scholar ]
19. Talavera D, Laskowski RA, Thornton JM. WSsas: a web service for the annotation of functional residues through structural homologues. Bioinformatics. 2009; 25 :1192–1194. [ PubMed ] [ Google Scholar ]
20. Snyder KA, Feldman HJ, Dumontier M, Salama JJ, Hogue CW. Domain-based small molecule binding site annotation. BMC Bioinformatics. 2006; 7 :152. [ PMC free article ] [ PubMed ] [ Google Scholar ]
21. Chen YC, Lo YS, Hsu WC, Yang JM. 3D-partner: a web server to infer interacting partners and binding models. Nucleic Acids Res. 2007; 35 :W561–567. [ PMC free article ] [ PubMed ] [ Google Scholar ]
22. Stein A, Panjkovich A, Aloy P. 3did Update: domain-domain and peptide-mediated interactions of known 3D structure. Nucleic Acids Res. 2009; 37 :D300–D304. [ PMC free article ] [ PubMed ] [ Google Scholar ]
23. Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, et al. CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 2009; 37 :D205–210. [ PMC free article ] [ PubMed ] [ Google Scholar ]
24. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 2007; 372 :774–797. [ PubMed ] [ Google Scholar ]
25. Chen J, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, et al. MMDB: Entrez's; 3D-structure database. Nucleic Acids Res. 2003; 31 :474–477. [ PMC free article ] [ PubMed ] [ Google Scholar ]
26. Sussman JL, Lin D, Jiang J, Manning NO, Prilusky J, Ritter O, Abola EE. Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr. D Biol. Crystallogr. 1998; 54 :1078–1084. [ PubMed ] [ Google Scholar ]
27. Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004; 32 :W327–W331. [ PMC free article ] [ PubMed ] [ Google Scholar ]
28. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009; 37 :W623–W633. [ PMC free article ] [ PubMed ] [ Google Scholar ]
29. Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 1996; 6 :377–385. [ PubMed ] [ Google Scholar ]
30. Wang Y, Bryant S, Tatusov R, Tatusova T. Links from genome proteins to known 3-D structures. Genome Res. 2000; 10 :1643–1647. [ PMC free article ] [ PubMed ] [ Google Scholar ]
31. Slonim N, Atwal GS, Tkacik G, Bialek W. Information-based clustering. Proc. Natl Acad. Sci. USA. 2005; 102 :18297–18302. [ PMC free article ] [ PubMed ] [ Google Scholar ]
32. Shoemaker BA, Panchenko AR, Bryant SH. Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci. 2006; 15 :352–361. [ PMC free article ] [ PubMed ] [ Google Scholar ]
33. Atwell S, Adams JM, Badger J, Buchanan MD, Feil IK, Froning KJ, Gao X, Hendle J, Keegan K, Leon BC, et al. A novel mode of Gleevec binding is revealed by the structure of spleen tyrosine kinase. J. Biol. Chem. 2004; 279 :55827–55832. [ PubMed ] [ Google Scholar ]
34. Wang Y, Addess KJ, Chen J, Geer LY, He J, He S, Lu S, Madej T, Marchler-Bauer A, Thiessen PA, et al. MMDB: annotating protein sequences with Entrez's; 3D-structure database. Nucleic Acids Res. 2007; 35 :D298–D300. [ PMC free article ] [ PubMed ] [ Google Scholar ]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press