A language model for mandarin Chinese

by Law, Hin-cheung

Abstract (Summary)
(Uncorrected OCR) Absact of thesis entitled A Language Model for Mandarin Chinese" submitted by Law Hin Cheung Hubert for the degree of Master of Philosophy at the University of Hong Kong in June 1997 A novel Ergodi ultigram Hidden Markov odel (HMM) is introdued which models senten prodution doubly stostic press, in whi sequene of hidden sttes, corresonding to wo ss re fist produed by fist ode Markov press nd then each of these sttes in tun genetes a acte sequene corresponding to one wod. No delimite e inseted between words in the geneted sentenes. Thus wo segmenttion is integted in the lnguge model This model n be used fo modeling languages, su s Chinese whih do not have wo boundary delimites, given lexion ontining syntti wo asses for h word. Its modeling ower is simil to ssbsed bigm lnguge models. This model ws extended to the JV-th ode whih is aalogous with ssbsed ra-gm models by nsfotion to m h stte to n odeed sequene of wo sses This model n be tined without wod-segmented nd tgged rpus. Appliations of this lnguge model include sing of ognize output to improve the ognition ccy and integrated word segmenttion nd ss tgging of sentences. In this the sis elevant lgorithms fo the models presented. Tined on Chinese text rpus eximent esults of the models ted.
Bibliographical Information:


School:The University of Hong Kong

School Location:China - Hong Kong SAR

Source Type:Master's Thesis

Keywords:mandarin dialects data processing


Date of Publication:01/01/1998

© 2009 All Rights Reserved.