Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning
Orlenko, Alena; Kofink, Daniel; Lyytikäinen, Leo-Pekka; Nikus, Kjell; Mishra, Pashupati; Kuukasjärvi, Pekka; Karhunen, Pekka J; Kähönen, Mika; Laurikka, Jari O; Lehtimäki, Terho; Asselbergs, Folkert W; Moore, Jason H
(2020) Bioinformatics (Oxford, England), volume 36, issue 6, pp. 1772 - 1778
(Article)
Abstract
Motivation: Selecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT)
... read more
to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discovered using a stochastic search method called genetic programing. We provide some guidelines for TPOT-based ML pipeline selection and optimization-based on various clinical phenotypes and high-throughput metabolic profiles in the Angiography and Genes Study (ANGES). Results: We analyzed nuclear magnetic resonance-derived lipoprotein and metabolite profiles in the ANGES cohort with a goal to identify the role of non-obstructive CAD patients in CAD diagnostics. We performed a comparative analysis of TPOT-generated ML pipelines with selected ML classifiers, optimized with a grid search approach, applied to two phenotypic CAD profiles. As a result, TPOT-generated ML pipelines that outperformed grid search optimized models across multiple performance metrics including balanced accuracy and area under the precision-recall curve. With the selected models, we demonstrated that the phenotypic profile that distinguishes non-obstructive CAD patients from no CAD patients is associated with higher precision, suggesting a discrepancy in the underlying processes between these phenotypes.
show less
Download/Full Text
Keywords: Coronary Artery Disease, Humans, Machine Learning, Metabolome, Metabolomics, Computational Mathematics, Molecular Biology, Biochemistry, Statistics and Probability, Computer Science Applications, Computational Theory and Mathematics, Journal Article, Research Support, N.I.H., Extramural
ISSN: 1367-4811
Publisher: Oxford University Press
Note: Funding Information: This work was supported by grant [R01 LM010098] from the National Institutes of Health (USA). Publisher Copyright: © The Author(s) 2019. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.
(Peer reviewed)