A system for inducing the phonology and inflectional morphology of a natural language
This thesis presents a machine learner that uses morphologically tagged data to induce clusters of words that take on similar inflections, while at the same time identifying sets of morphological rules that are associated with each cluster. The learner also identifies simple phonological alternations. This work is significant because it uses a relatively simple framework to discover prefixes, suffixes, and infixes, each of which may be associated with one or more morphological feature values, while simultaneously discovering certain simple phonological alternations. The learner makes use of Bayesian principles to determine which grammar, out of several, is the most apt, while the search is performed in a greedy manner: starting from an initial state in which every lemma is assigned to its own inflection class, the learner attempts to merge existing inflection classes while improving the posterior probability of the hypothesis. As these inflection classes are merged, the learner develops a more and more accurate picture of the morphological rules associated with each inflection class, as well as of the surface-true phonological alternations that apply throughout the language. This work demonstrates that many of the principles of word formation posited by linguists can indeed be induced using probabilistic methods, and it also serves as a key step in improving the level of detail in the grammars of word formation returned by an automatic learner.
Full Text Links
Advisor:Gaja Jarosz; Stephen R. Anderson
School Location:USA - Connecticut
Source Type:Doctoral Dissertation
Keywords:linguistics; computational linguistics; phonology; morphology
Date of Publication:05/23/2011