Using structure recurrence to define protein domains

Chin-Hsien Tai; Sam Vichetra; Jean-François Gibrat; Peter Munson; Byungkook Lee; Jean Garnier

Poster De Conférence Année : 2010

Using structure recurrence to define protein domains

(1, 2) , (3, 2) , (4) , (3, 2) , (1, 2) , (4)

1
2
3
4

Chin-Hsien Tai

Fonction : Auteur

National Cancer Institute

National Institutes of Health [Bethesda, MD, USA]

Sam Vichetra

Fonction : Auteur

Center for Information Technology

National Institutes of Health [Bethesda, MD, USA]

Jean-François Gibrat

Fonction : Auteur
PersonId : 172679
IdHAL : jean-francois-gibrat
ORCID : 0000-0002-2623-5698
IdRef : 031134564

Unité Mathématique Informatique et Génome

Peter Munson

Fonction : Auteur

Center for Information Technology

National Institutes of Health [Bethesda, MD, USA]

Byungkook Lee

Fonction : Auteur

National Cancer Institute

National Institutes of Health [Bethesda, MD, USA]

Jean Garnier

Fonction : Auteur

Unité Mathématique Informatique et Génome

Résumé

Domains are basic units of protein structure and essential for exploring protein fold space and structure evolution. With the NIH Protein Structure Initiative and other structural genomics initiatives worldwide, the number of protein structures in PDB is increasing dramatically and domain parsing needs to be done automatically. Most of the existing structural domain parsing programsconsider the compactness of the domains and/or the number and strength of internal (intra-domain) versus external (inter-domain) contacts. Here we present a completely different approach. Taking advantage of the growing number of known structures in the PDB, the chains are parsed solely by using recurrence of similar structures that appear in the structural database. A non-redundant set of 6373 protein chains was selected as the target data set and 128 benchmark chains from pDomains were used as query chains. For each query chain, one against all target structure comparisons were performed using VAST. Then the VAST cliques were collected and the protein residues were clustered using mathematical procedures akin to those used for analyzing the microarray data. These clusters define domains. NDO scores were used to compare the results with SCOP and CATH domain boundaries as well as with those from other parsing programs. Our algorithm gave results that were comparable to those of several existing programs. It handles segmented domains equally well as non-segmented domains. The structures that contribute the cliques that define a domain may contain distant evolutionary information of the domain.

Mots clés

Secondary structures recurrence

proteins domains structures secondaires proteines domaines STRUCTURE DU GENOME

Domaines

Mathématiques [math] Informatique [cs] Sciences du Vivant [q-bio]

Fichier principal

49824_20120224031122697_1.pdf (40.35 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02758033

Soumis le : jeudi 4 juin 2020-03:00:20

Dernière modification le : jeudi 14 mars 2024-03:13:43

Archivage à long terme le : vendredi 4 décembre 2020-17:26:39

Dates et versions

hal-02758033 , version 1 (04-06-2020)

Identifiants

HAL Id : hal-02758033 , version 1
PRODINRA : 49824
WOS : 000208762004275

Citer

Chin-Hsien Tai, Sam Vichetra, Jean-François Gibrat, Peter Munson, Byungkook Lee, et al.. Using structure recurrence to define protein domains. Biophysical Society 54th Annual Meeting, Feb 2010, San Francisco, United States. pp.CD, 2010, acte of Biophysical Society 54th Annual Meeting. ⟨hal-02758033⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRA INRAE MATHNUM

11 Consultations

21 Téléchargements

Using structure recurrence to define protein domains

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager