Abstract
Handling classification uncertainty is crucial for supporting efficient and ethical classification systems. This thesis addresses uncertainty issues from the perspective of end-users with limited expertise in machine learning. We investigate uncertainties that pertain to estimating class sizes, i.e., numbers of objects per class. We aim at enabling non-expert end-users to
... read more
conduct uncertainty-aware and scientifically-valid analysis of class sizes. We research the means to support end-users' understanding of class size uncertainty. After investigating the specific use case of in-situ video monitoring of animal populations, where classes represent animal species, we derive generalizable methods for: -Assessing the uncertainty factors and the uncertainty propagation that result in high-level errors and biases in class size estimates. -Estimating the magnitude of classification errors in class size estimates. -Visualizing classification uncertainty when evaluating classification systems, and interpreting class size estimates. We first study the high-level information needs that can or cannot be addressed by computer vision techniques for monitoring animal populations. The uncertainty issues inherent to each data collection technique, and high-level requirements for uncertainty assessment are identified. We further investigate the information that support end-users in developing informed uncertainty assessments. We explore how information about classification errors impacts users' understanding, trust and acceptance of the computer vision system. We highlight unfulfilled information needs requiring additional uncertainty assessments. From these insights, we identify key uncertainty factors to address for enabling scientifically valid analyses of classification results. Our scope includes uncertainty factors from computer vision systems, and from the conditions in which systems are deployed. We identify the interactions between uncertainty factors, how uncertainties propagates to high-level information, and the uncertainty assessment methods that are applicable or missing. We then investigate uncertainty assessment methods for estimating the numbers of errors in classification end-results. We highlight the unaddressed case of disjoint test and target sets, which impacts the variance of error estimation results. We introduce 3 new methods: -The Sample-to-Sample method estimates the variance of error estimation results for disjoint test and target sets. -The Maximum Determinant method uses the determinant of error rate matrices as a predictor of the variance of error estimation results. -The Ratio-to-TP method uses atypical error rates that have properties of interest for predicting the variance of error estimation results. We then investigate the means to communicate uncertainty to non-expert end-users. We introduce a simplified design for visualizing classification errors. We present a user study that compares our simplified visualization to well-established visualizations. We identify the main difficulties that users encountered with the visualizations and with understanding classification errors. Finally, we introduce a visualization tool that enables end-users to explore class size estimate, and the uncertainties in specific subsets of the data. We present a user study that investigates how the interface design supports user awareness of uncertainty. We highlight the factors that facilitated or complicated the exploration of the data and its uncertainties. Our results contribute to a broader range of applications dealing with uncertain computer vision and classification data. They inform the design of comprehensive uncertainty assessment methods and tools.
show less