Abstract
Despite the progress that modern day health care has made in improving people's health and well-being, there are still many open questions related to disease and treatment. There is a need for new and innovative approaches, to further expand medical knowledge and to keep health care affordable. Analyzing Electronic Health
... read more
Records (EHRs) is such a potential source of innovation, since EHRs often contain information hidden on an aggregated level, that can be made explicit through a knowledge discovery process. In this research, we focus on analyzing EHRs in psychiatry, the field that specializes in mental health care. We pose the following overarching research question: How can data from Electronic Health Records provide relevant insights for psychiatric care? In the first three research chapters of this work, we identify key technical, organizational and ethical challenges related to knowledge discovery in EHRs, for which we subsequently propose solutions. First, we look at collaboration between data experts, well versed in the technical part of data analysis, and practitioners, who are an excellent source of domain knowledge. We show how new knowledge and hypotheses can be found using our CRISP-IDM process, most of which were not imagined beforehand. Secondly, we investigate how to design technical infrastructure, consisting of hardware and software components, that enables using EHR data for analysis. We introduce the Capable Reuse of EHR Data (CARED) framework, which addresses nine important requirements, such as integrating data sources, support for collaboration and documentation, and privacy and security. Thirdly, we develop and validate the De-identification Method for Dutch Medical Text (DEDUCE), which aims to automatically remove information that can identify a patient from free text. It is a rule-based method that successfully removes information in categories such as person names and geographical locations. In the second part of this research, we focus on applying knowledge discovery techniques to EHR data to obtain new insights with potential to improve care. First we look at violence risk assessment, by investigating whether applying machine learning techniques to clinical notes from patients' EHRs is a fruitful novel approach. After exploring which types of models, including relatively recent deep learning models, show promise for such a classification task, we obtain two indepdendent datasets of psychiatric admissions and clinical notes from EHRs. We use these two datasets to train models that can assess violence risk based, and then evaluate their accuracy and generalizability. Our findings show that such models have definite potential for use in practice. Finally, we turn to identifying psychiatric patient subgroups, and investigate how unsupervised learning can find robust and accurate stratifications of patients. We use cluster ensembles, combinations of multiple clusterings, to obtain three significant clusters of adolescent patients, and assess their meaning and relation to other relevant clinical variables. The two parts of this dissertation combined show that learning from EHRs, after addressing key challenges related to the nature of data, is a new and interesting approach with clear potential for improving psychiatric health care.
show less