A language model for mandarin Chinese
Abstract (Summary)
(Uncorrected OCR)
Absact of thesis entitled
A Language Model for Mandarin Chinese"
submitted by
Law Hin Cheung Hubert
for the degree of
Master of Philosophy
at the University of Hong Kong in
June 1997
A novel Ergodi ultigram Hidden Markov odel (HMM) is introdued which models senten prodution doubly stostic press, in whi sequene of hidden sttes,
corresonding to wo ss re fist produed by fist ode Markov press
nd then each of these sttes in tun genetes a acte sequene corresponding to one
wod. No delimite e inseted between words in the geneted sentenes. Thus wo
segmenttion is integted in the lnguge model
This model n be used fo modeling languages, su s Chinese whih do not have wo boundary delimites, given lexion ontining syntti wo asses for h word. Its modeling ower is simil to ssbsed bigm lnguge models. This model ws extended to the JV-th ode whih is aalogous with ssbsed ra-gm models by nsfotion
to m h stte to n odeed sequene of wo sses
This model n be tined without wod-segmented nd tgged rpus. Appliations of this lnguge model include sing of ognize output to improve the ognition ccy and integrated word segmenttion nd ss tgging of sentences. In this the sis elevant lgorithms fo the models presented. Tined on Chinese text rpus eximent esults of the models ted.
Bibliographical Information:
Advisor:
School:The University of Hong Kong
School Location:China - Hong Kong SAR
Source Type:Master's Thesis
Keywords:mandarin dialects data processing
ISBN:
Date of Publication:01/01/1998