T. O’Donnell, Noah D. Goodman, J. Tenenbaum

Technical Report

Abstract

Language relies on a division of labor between stored units and structure building operations which combine the stored units into larger structures. This division of labor leads to a tradeoff: more structure-building means less need to store while more storage means less need to compute structure. We develop a hierarchical Bayesian model called fragment grammar to explore the optimum balance between structure-building and reuse. The model is developed in the context of stochastic functional programming (SFP), and in particular, using a probabilistic variant of Lisp known as the Church programming language [17]. We show how to formalize several probabilistic models of language structure using Church, and how fragment grammar generalizes one of them— adaptor grammars [21]. We conclude with experimental data with adults and preliminary evaluations of the model on natural language corpus data.