Investigations of Term Expansion on Text Mining Techniques
Recent advances in computer and network technologies have contributed significantly to global connectivity and stimulated the amount of online textual document to grow extremely rapidly. The rapid accumulation of textual documents on the Web or within an organization requires effective document management techniques, covering from information retrieval, information filtering and text mining. The word mismatch problem represents a challenging issue to be addressed by the document management research. Word mismatch has been extensively investigated in information retrieval (IR) research by the use of term expansion (or specifically query expansion). However, a review of text mining literature suggests that the word mismatch problem has seldom been addressed by text mining techniques. Thus, this thesis aims at investigating the use of term expansion on some text mining techniques, specifically including text categorization, document clustering and event detection. Accordingly, we developed term expansion extensions to these three text mining techniques. The empirical evaluation results showed that term expansion increased the categorization effectiveness when the correlation coefficient feature selection was employed. With respect to document clustering, techniques extended with term expansion achieved comparable clustering effectiveness to existing techniques and showed its superiority in improving clustering specificity measure. Finally, the use of term expansion for supporting event detection has degraded the detection effectiveness as compared to the traditional event detection technique.
Advisor:Chih-Ping Wei; none; Fu-ren Lin
School:National Sun Yat-Sen University
School Location:China - Taiwan
Source Type:Master's Thesis
Keywords:term association word mismatch text mining event detection expansion document clustering categorization
Date of Publication:08/02/2002