Information extraction from articles for the elaboration of the regulatory networks involved in Arabidopsis seed development - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Information extraction from articles for the elaboration of the regulatory networks involved in Arabidopsis seed development

Bertrand Dubreucq
Loïc Lepiniec

Résumé

Seed is the main vector for breeding and production of annual field crops, and the accumulation of seed storage compounds (sugars, lipids, proteins) is of primary importance for food, feed and industrial uses. Seed development requires the coordinated growth of different tissues and involves complex genetics and environmental regulations. A comprehensive understanding of the molecular network underlying these regulations remains a major scientific challenge with important potential impact for agriculture and industry. Knowledge on these regulations is spread in a high number of scientific articles (e.g. Pubmed query “Arabidopsis seed” yields more than 6000 references) and is difficult to analyze. The molecular and genetic mechanisms are described by complex expressions that involve biological entities linked by various specific semantic relations. The aim of this work is to automatically extract the information (i.e. entities and relations between entities) by developing generic Natural Language Processing and Machine Learning methods. The approach consists in 1) the formal annotation of examples in a set of documents with respect to an annotation model, 2) training methods on the examples and, 3) the application of the methods to new texts to extract knowledge. Last we plan to integrate the extracted knowledge in a comprehensive regulatory model, with database and graphical representation tools. We expect these tools to be useful for analyzing other gene regulatory networks.
Fichier non déposé

Dates et versions

hal-01524850 , version 1 (18-05-2017)

Identifiants

  • HAL Id : hal-01524850 , version 1
  • PRODINRA : 344667

Citer

Bertrand Dubreucq, Dialekti Valsamou, Abdelhak Fatihi, Estelle Chaix, Robert Bossy, et al.. Information extraction from articles for the elaboration of the regulatory networks involved in Arabidopsis seed development. 26th International Conference on Arabidopsis Research, Jul 2015, Paris, France. ⟨hal-01524850⟩
260 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More