Details

Investigations of Term Expansion on Text Mining Techniques

by Yang, Chin-Sheng

Abstract (Summary)
Recent advances in computer and network technologies have contributed significantly to global connectivity and stimulated the amount of online textual document to grow extremely rapidly. The rapid accumulation of textual documents on the Web or within an organization requires effective document management techniques, covering from information retrieval, information filtering and text mining. The word mismatch problem represents a challenging issue to be addressed by the document management research. Word mismatch has been extensively investigated in information retrieval (IR) research by the use of term expansion (or specifically query expansion). However, a review of text mining literature suggests that the word mismatch problem has seldom been addressed by text mining techniques. Thus, this thesis aims at investigating the use of term expansion on some text mining techniques, specifically including text categorization, document clustering and event detection. Accordingly, we developed term expansion extensions to these three text mining techniques. The empirical evaluation results showed that term expansion increased the categorization effectiveness when the correlation coefficient feature selection was employed. With respect to document clustering, techniques extended with term expansion achieved comparable clustering effectiveness to existing techniques and showed its superiority in improving clustering specificity measure. Finally, the use of term expansion for supporting event detection has degraded the detection effectiveness as compared to the traditional event detection technique.
Bibliographical Information:

Advisor:Chih-Ping Wei; none; Fu-ren Lin

School:National Sun Yat-Sen University

School Location:China - Taiwan

Source Type:Master's Thesis

Keywords:term association word mismatch text mining event detection expansion document clustering categorization

ISBN:

Date of Publication:08/02/2002

© 2009 OpenThesis.org. All Rights Reserved.