Abstract
High Content Screening (HCS) is a technology that allows life scientists to analyze the effect of bioactive molecules on cellular phenotypes. It is a subset of High Throughput Screening (HTS), where the limitation is in the number of features that can be extracted from one well using 96, 384 or
... read more
1536-well microplates. Data acquired using HCS is using the same principle but can produce much more in-depth data. In HCS, generated data sets can be very large and complex. The utility of HCS is based on the fact that the profile of the extracted numeric features can be used to compare phenotypes. The technology is widely used in drug discovery projects, academia and the pharmaceutical industry.
This thesis investigates the data analysis workflow in HCS. It became evident that the majority of screening facilities do not use appropriate software for the analysis of HCS data with the result that a relatively small proportion of the generated data is actually used for data analysis. The main research question of this PhD thesis therefore is:
“How can multi-parametric data analysis contribute to effective knowledge discovery in High Content Screening?”
This research consists of six papers, mapped to the industry-standard framework of Knowledge Discovery in Databases (KDD). In Chapter two, we start with an abstract view that describes the entire process of High Throughput Screening (HTS). This reveals the main problems in the field. Then we examine the area of HCS, a subset of HTS, and target the problems that were identified in Chapter two. In Chapter three, we develop a workflow based on unsupervised data analytics methods and is implemented as web-based software called HC StratoMineR. The method and implementation are tested, validated and reported.
Chapter four is centered around gains and losses of using supervised data analytics methods in the field of HCS. Chapter five showcases the usage of interactive visualizations in the field of HCS. It involves an experiment in which bioinformatics students performed data analysis exercises using static or interactive data visualizations. The results demonstrate the effectiveness of using interactive visualizations.
Chapter six is focused on the burden of preprocessing data in the field of data science. A standard data analysis protocol is developed to automate this process and is made available in an R package. The protocol is implemented, tested, validated and reported in this chapter. Finally, chapter seven is a typical example in laboratory practice that applies and validates the methods introduced in chapter three to a chemical screen. The results demonstrate and verify the potential of the methods introduced in this thesis.
show less