Abstract
Samenvatting: A researcher in the field of social sciences is faced with many choices when conducting research. The variety of these choices can be perceived as a garden of forking paths (Gelman & Loken, 2014). Making a decision on which path to take, is called a researcher degree of freedom (Simmons,
... read more
Nelson, & Simonsohn, 2011; Wicherts, Veldkamp, Augusteijn, Bakker, van Aert, & Van Assen, 2016). Choosing the data analysis approach can be seen as a predetermined process to extract results from the data, while there are often multiple reasonable approaches that could answer the same research question (Gelman & Loken, 2014). However, the answers found by these different approaches may vary from one another. We will refer to the variety between these outcomes as non-substantive error variance. This type of error variance is relatively underexamined in scientific literature, even though it addresses fundamental questions of performing analyses. How much do outcomes differ from one another and what does this mean to the conclusions being drawn from them? In other words, what influence does the analytic strategy have on research results in general? Accordingly, Steegen Tuerlinckx, Gelman, Vanpaemel (2016) have investigated the variety of results due to differences in dataset construction. Seemingly arbitrary choices during dataset construction may produce different datasets, resulting in a so-called multiverse of datasets and corresponding results. To study this multiverse of datasets, they performed the same analyses on alternatively processed datasets. Even though this multiverse analysis uncovers the fragility or robustness of results, it also implicates subjectivity of the researchers in their choices regarding the processing of datasets. Furthermore, Silberzahn et al. (2018) investigated the variety of research results when different research teams answered the same research question using the same dataset. Of a total of 29 teams, 20 reported a significant positive relationship, while nine teams did not find any significant relationship at all. These differences could not be explained by looking at the prior beliefs or the level of expertise of the researchers participating in the study. However, there were 21 unique combinations of covariates. Using different variables in the analysis may implicitly mean that the interpretation of the research question itself varied among the research teams. This might have influenced the differences in outcomes. Both studies depicted the variety of results through a combination of subjective psychological decisions of the researchers themselves and non-substantive error variance (Steegen et al., 2016; Silberzahn et al., 2018).
IRIS FLOWERS IN A GARDEN OF FORKING PATHS 3
Consequently, to address the influence of non-substantive error variance on its own, a different approach is needed. Therefore, this paper focuses on the variety of outcomes between analytical methods when addressing the same scientific problem. More specifically, this study addresses the variety in predictive accuracy between classification methods on the famous Iris dataset (Anderson, 1935; Fisher, 1936). With more than 2.41 million hits, the Iris dataset is one of the most widely used datasets in history (Sun, Yang, Zhang, Lin, Dong, Young, & Dong, 2019). The data describes the petal and sepal measurements of three species of Iris flowers (Anderson, 1935). Because of the simplicity and the natural origin of the data, there is no need to process the data in any way. It eliminates the necessity of a multiverse analysis and moreover, it allows us to demonstrate the variety of results that solely come from the methods used to classify this data. In the next section the iris dataset, classification, and predictive accuracy are described in more detail. Subsequently, more details on the procedure of the literature study and the comparison of methods are provided. The fourth and fifth section present the results of the comparison of methods and its implications respectively. The limitations of the study and proposals for future research are discussed in the final section. 2
show less