Probabilistic partial least squares model: Identifiability, estimation and application
el Bouhaddani, Said; Uh, Hae Won; Hayward, Caroline; Jongbloed, Geurt; Houwing-Duistermaat, Jeanine
(2018) Journal of Multivariate Analysis, volume 167, pp. 331 - 346
(Article)
Abstract
With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within
... read more
life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome-dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts.
show less
Download/Full Text
The full text of this publication is not available.
Keywords: Dimension reduction, EM algorithm, Identifiability, Inference, Probabilistic partial least squares, Statistics and Probability, Numerical Analysis, Statistics, Probability and Uncertainty
ISSN: 0047-259X
Publisher: Academic Press Inc.
Note: Funding Information: The authors would like to thank the Editor-in-Chief, the Associate Editor and the referees for their valuable comments and suggestions. This work has been supported by the European Union’s Seventh Framework Programme (FP7-Health-F5-2012) under grant agreement number 305280 (MIMOmics). The CROATIA_Vis and CROATIA_Korcula studies were funded by grants from the Medical Research Council (UK) , European Commission Framework 6 project EUROSPAN (Contract No. LSHG-CT-2006-018947 ), FP7 contract BBMRI-LPC (grant No. 313010 ), Croatian Science Foundation (grant 8875 ) and the Republic of Croatia Ministry of Science, Education and Sports ( 216-1080315-0302 ). We would like to acknowledge the staff of several institutions in Croatia that supported the field work, including but not limited to The University of Split and Zagreb Medical Schools, Institute for Anthropological Research in Zagreb and the Croatian Institute for Public Health. Glycome analysis was supported by the European Commission HighGlycan (contract No. 278535 ), MIMOmics (contract No. 305280 ), HTP-GlycoMet (contract No. 324400 ), IntegraLife (contract No. 315997 ). The IgG glycan data are available upon request. Publisher Copyright: © 2018 Elsevier Inc.
(Peer reviewed)