Risk of bias in studies on prediction models developed using supervised machine learning techniques: Systematic review
Andaur Navarro, Constanza L.; Damen, Johanna A.A.; Takada, Toshihiko; Nijman, Steven W.J.; Dhiman, Paula; Ma, Jie; Collins, Gary S.; Bajpai, Ram; Riley, Richard D.; Moons, Karel G.M.; Hooft, Lotty
(2021) The BMJ, volume 375, pp. 1 - 9
(Article)
Abstract
OBJECTIVE: To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties. DESIGN: Systematic review. DATA SOURCES: PubMed from 1 January 2018 to 31 December 2019. ELIGIBILITY CRITERIA: Articles reporting on the development, with or without external validation, of a multivariable prediction
... read more
model (diagnostic or prognostic) developed using supervised machine learning for individualised predictions. No restrictions applied for study design, data source, or predicted patient related health outcomes. REVIEW METHODS: Methodological quality of the studies was determined and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST). This tool contains 21 signalling questions tailored to identify potential biases in four domains. Risk of bias was measured for each domain (participants, predictors, outcome, and analysis) and each study (overall). RESULTS: 152 studies were included: 58 (38%) included a diagnostic prediction model and 94 (62%) a prognostic prediction model. PROBAST was applied to 152 developed models and 19 external validations. Of these 171 analyses, 148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of bias. The analysis domain was most frequently rated at high risk of bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an inadequate number of events per candidate predictor, 62 handled missing data inadequately (41%, 33% to 49%), and 59 assessed overfitting improperly (39%, 31% to 47%). Most models used appropriate data sources to develop (73%, 66% to 79%) and externally validate the machine learning based prediction models (74%, 51% to 88%). Information about blinding of outcome and blinding of predictors was, however, absent in 60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models, respectively. CONCLUSION: Most studies on machine learning based prediction models show poor methodological quality and are at high risk of bias. Factors contributing to risk of bias include small study size, poor handling of missing data, and failure to deal with overfitting. Efforts to improve the design, conduct, reporting, and validation of such studies are necessary to boost the application of machine learning based prediction models in clinical practice. SYSTEMATIC REVIEW REGISTRATION: PROSPERO CRD42019161764.
show less
Download/Full Text
Keywords: Bias, Clinical Decision Rules, Data Interpretation, Statistical, Humans, Machine Learning, Models, Statistical, Multivariate Analysis, Risk, Journal Article, Research Support, Non-U.S. Gov't
ISSN: 0959-8146
Publisher: BMJ Publishing Group
Note: Funding Information: We acknowledge the support of René Spijker, information specialist. Contributors: CLAN, JAAD, PD, LH, RDR, GSC, and KGMM conceived and designed the study. CLAN, JAAD, TT, SN, PD, JM, and RB screened the articles and extracted data. CLAN performed data analysis and wrote the first draft of this manuscript. All authors revised the manuscript and approved the final version. CLAN is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. Funding: This study received no specific funding. GSC is supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC) and by a Cancer Research UK programme grant (C49297/A27294). PD is supported by the NIHR Oxford BRC. The views expressed are those of the authors and not necessarily those of the NHS, NIHR, or Department of Health. None of the funding sources had a role in the design, conduct, analyses, or reporting of the study or in the decision to submit the manuscript for publication. Publisher Copyright: © 2021, British Medical Journal Publishing Group. All rights reserved. © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY. No commercial re-use. See rights and permissions. Published by BMJ.
(Peer reviewed)