Statistical Evidence for Learnable Lexical Subclasses in Japanese

T. Morita, T. O’Donnell

Linguistic Inquiry

Abstract

It has been proposed that the Japanese lexicon can be divided into etymologically defined sublexica on phonotactic and other grounds. However, the psychological reality of this sublexical analysis has been challenged by some authors, who have appealed to putative problems with the learnability of the system. In this study, we apply a commonly used clustering method to Japanese words and show that there is robust statistical evidence for the sublexica and, thereby, that such sublexica are learnable. The model is able to recover phonotactic properties of sublexica previously discussed in the literature, and also reveals hitherto unnoticed phonotactic properties that are characteristic of sublexical membership and can serve as a basis for future experimental investigations. The proposed approach is general and based purely on phonotactic information and thus can be applied to other languages.