Unsupervised multiple kernel learning for heterogeneous data integration - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Article Dans Une Revue Bioinformatics Année : 2018

Unsupervised multiple kernel learning for heterogeneous data integration

Résumé

Motivation: Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account. Results: We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single kernel PCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of these two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system. Availability and implementation: Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/.
Fichier principal
Vignette du fichier
mariette_villavialaneix_B2018.pdf (357.92 Ko) Télécharger le fichier
OUP_First_SBk_Bot_8401-eps-converted-to.pdf (9.04 Ko) Télécharger le fichier
TCGA_KSOM.jpg (23.94 Ko) Télécharger le fichier
all_figures.pdf (47.83 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01738461 , version 1 (20-03-2018)

Identifiants

Citer

Jérôme J. Mariette, Nathalie Vialaneix. Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics, 2018, 34 (6), pp.1009-1015. ⟨10.1093/bioinformatics/btx682⟩. ⟨hal-01738461⟩
112 Consultations
193 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More