An Evaluation of Projection Techniques for Document Clustering: Latent Semantic Analysis and Independent Component Analysis

by Elsas, Jonathan L.

Abstract (Summary)
Dimensionality reduction in the bag-of-words vector space document representation model has been widely studied for the purposes of improving accuracy and reducing computational load of document retrieval tasks. These techniques, however, have not been studied to the same degree with regard to document clustering tasks. This study evaluates the effectiveness of two popular dimensionality reduction techniques for clustering, and their effect on discovering accurate and understandable topical groupings of documents. The two techniques studied are Latent Semantic Analysis and Independent Component Analysis, each of which have been shown to be effective in the past for retrieval purposes.
Bibliographical Information:

Advisor:Robert M. Losee

School:University of North Carolina at Chapel Hill

School Location:USA - North Carolina

Source Type:Master's Thesis

Keywords:information retrieval statistical methods evaluation


Date of Publication:07/06/2005

© 2009 All Rights Reserved.