Abstract
Is my text comprehensible for my audience? It is a question publishers, organizations and governments struggle with and it is a question that readability formulae proclaim to solve. With a press of a button the readability of a text is assessed and users know whether texts are suited for their
... read more
intended readers. Because the need for objective measures of readability has only increased, readability formulae have retained their overall popularity. This despite a steady stream of criticism. Fortunately, developments in computational linguistics have opened up new possibilities to improve the old readability formulae. In her dissertation, Suzanne Kleijn combined current language technology with insights from readability research and discourse processing in an attempt to build an empirically validated readability tool for Dutch secondary school readers. As a result, the findings are relevant both to the field of discourse processing and to practitioners aiming for readability improvement. Kleijn investigated the relationship between linguistic features and two aspects of readability: comprehension and processing ease. Comprehension was measured using an especially developed cloze procedure (‘The HyTeC-cloze’) and processing ease was measured using eye-movement registration. In her design, Kleijn combined experimental and correlational work in order to disentangle causal effects of linguistic features on readability from correlational relationships. That is, readability differences between texts and differences between stylistic variants of the same text were studied at the same time. In three separate experiments only the lexical complexity, the syntactic complexity or the number of coherence markers within texts was changed to see how these factors affect readability. While reducing a text’s lexical complexity or syntactic complexity improved text comprehension (as measured with the developed cloze tests) and increased processing ease (as measured with eye-movement registration), coherence markers showed mixed results. Adding contrastive connectives (e.g., maar ‘but’) or causal connectives (e.g., dus ‘so’) had a positive effect on comprehension of their immediate context, but inserting additive connectives (e.g., daarnaast ‘furthermore’) had a negative effect on comprehension. Taken together, the three experimental studies tested 2900+ Dutch adolescents and provided comprehension data for 60 texts (in two versions). These data were used to build a multilevel model to predict readability of Dutch texts for Dutch adolescents. Linguistic features were automatically extracted from the texts using the text analysis tool T-Scan and added to the model. The final model (‘U-Read’) included five factors: Word frequency of content words (without names and corrected for compounds), Content words per clause, Concrete nouns, Maximum syntactic dependency length and Adjectival pas participles. Together these features explained 23% of the observed variance in comprehension scores, which is a 20% improvement compared to predictors in two popular Dutch readability formulae, the Flesch-Douma and CLIB-formula. Although these features were found to be good predictors of readability, they do not necessarily have a strong causal effect on comprehension. This is because readability formulae are based on differences between texts. For example, texts containing difficult words tend to discuss difficult or unfamiliar topics. Replacing difficult words in such a text for more familiar words will reduce the text’s stylistic complexity, but not the complexity associated with its topic and content. As a result, the effects of changing a linguistic feature are relatively small compared to the differences predicted on the basis of between-text differences. Kleijn’s findings thus provide a realistic (and sobering) view of the importance of linguistic features and their potential for reducing the difficulty level of a given text.
show less