Abstract
This dissertation is a collection of articles which present the results of a research project that investigated linguistic rule induction from an information-theoretic perspective. The main goal of this research project was to propose and test an innovative entropy model for rule induction based on Shannon’s noisy-channel coding theory (Shannon,
... read more
1948).
Rule induction (generalization or regularization) is an essential language acquisition mechanism that empowers language learners to not only memorize specific items (e.g. phonemes, words) experienced when exposed to linguistic input (language), but also to acquire relations between these items. For example, when people learn languages, they not only memorize combinations of words like ‘Mom walked slowly’, but they also learn generalized rules about how categories of words can be combined (e.g. Noun-Verb-Adverb).
These relations range from statistical patterns between specific items present in the linguistic input (Saffran, Aslin, & Newport, 1996; Thiessen & Saffran, 2007) to more abstract category/rule induction (Marcus, Vijayan, Rao, & Vishton, 1999; Smith & Wonnacott, 2010; Wonnacott & Newport, 2005). This research addressed the inductive steps from memorizing specific items, to inferring rules (or statistical patterns) between these specific items (item-bound generalizations), and also to forming rules that apply to categories of items (category-based generalization).
The main research questions of this research were: (1) whether the two forms of generalization are outcomes of the same learning mechanism or two different mechanisms, and (2) what factors drive rule induction with its two forms of generalization.
In order to answer these questions, I proposed an innovative theoretical model – an entropy and noisy-channel capacity model – that makes predictions about the transition from memorization to rule induction. In this model, two factors drive rule learning: (1) entropy (a measure of the information content, which quantifies the richness and unpredictability of the language) and (2) channel capacity (the amount of information, including noise, that learners can process per second, since learning happens in time and in noisy environments). I defined our brain’s encoding capacity as channel capacity at the computational level, in the sense of Marr (1982), which is the finite rate of information encoding (bits per second). At the algorithmic level, the channel capacity might be supported by cognitive capacities involved in processing and encoding information, e.g. memory and attention.
I tested the entropy model across multiple grammar learning experiments, both with adults and infants. Findings showed that when entropy increases (e.g. when the language has a richer vocabulary, or more diverse combinations of words), learners are more likely to generalize rules than to memorize combinations of words. Contrary to intuition, the same happens when the channel capacity is pushed to the limit, by supplying information faster, and also when there is background noise that distracts from the language. These findings bring evidence in favor of the entropy model. The dissertation also sketches the first joint information-theoretic and thermodynamic model of rule induction, proposing that the 2nd law of thermodynamics and the constructal law of thermodynamics can answer why and how rule induction happens.
show less