A system for inducing the phonology and inflectional morphology of a natural language

by McClure, Scott, PhD

Abstract (Summary)
This thesis presents a machine learner that uses morphologically tagged data to induce clusters of words that take on similar inflections, while at the same time identifying sets of morphological rules that are associated with each cluster. The learner also identifies simple phonological alternations. This work is significant because it uses a relatively simple framework to discover prefixes, suffixes, and infixes, each of which may be associated with one or more morphological feature values, while simultaneously discovering certain simple phonological alternations. The learner makes use of Bayesian principles to determine which grammar, out of several, is the most apt, while the search is performed in a greedy manner: starting from an initial state in which every lemma is assigned to its own inflection class, the learner attempts to merge existing inflection classes while improving the posterior probability of the hypothesis. As these inflection classes are merged, the learner develops a more and more accurate picture of the morphological rules associated with each inflection class, as well as of the surface-true phonological alternations that apply throughout the language. This work demonstrates that many of the principles of word formation posited by linguists can indeed be induced using probabilistic methods, and it also serves as a key step in improving the level of detail in the grammars of word formation returned by an automatic learner.
Full Text Links

Main Document: View

Bibliographical Information:

Advisor:Gaja Jarosz; Stephen R. Anderson

School:Yale University

School Location:USA - Connecticut

Source Type:Doctoral Dissertation

Keywords:linguistics; computational linguistics; phonology; morphology


Date of Publication:05/23/2011

© 2009 All Rights Reserved.