Productivity and Reuse in Language

T. O’Donnell, J. Snedeker, J. Tenenbaum, Noah D. Goodman

CogSci

Abstract

A much-celebrated aspect of language is the way in which it allows us to make “infinite use of finite means” (von Humboldt, 1836). This property is made possible because language is fundamentally a productive computational system: Novel expressions can be composed from a large inventory of stored, reusable parts. For any given language, however, there are many more potential ways of forming novel expressions than can actually be used in practice. For example, English contains suffixes that are highly productive (e.g., -ness ; Lady-Gagaesqueness, pine-scentedness), but also contains suffixes which can only be reused in specific, existing words (e.g., -th; truth, width, warmth). How are such differences in productivity and reusability represented? How can the child acquire this system of knowledge? This thesis presents a formal model of productivity and reuse which treats the problem as a structure-by-structure inference in a Bayesian framework. The model—Fragment Grammars, a generalization of Adaptor Grammars (Johnson et al., 2007a)—is built around two proposals. The first is that anything that can be computed can be stored. The specific computational mechanism by which this is accomplished, stochastic memoization, is inherited from Adaptor Grammars (Goodman et al., 2008; Johnson et al., 2007a). The second proposal is that any stored item can include subparts which must be computed productively. This is made possible by the computational mechanism of stochastically lazy evaluation, introduced in the thesis. Throughout the thesis, Fragment Grammars are systematically compared to four other probabilistic models of productivity and reuse which formalize historical proposals from the linguistics and psycholinguistics literatures. The five models are evaluated on two very different sub-systems of English morphology: the English past tense, which is characterized by a sharp dichotomy in productivity between regular (i.e., +ed) and irregular (e.g., sing/sang) forms, and English derivational morphology, which is characterized by a graded cline from very productive (e.g., -ness) to very unproductive (e.g., -th). The thesis examines many aspects of these two domains including: performance on past-tense inflection, past-tense processing phenomena, developmental overregularization, the productivity of derivational suffixes, ordering restrictions on derivational suffixes, and base-driven selectional restrictions on suffix combinations.