Enhancing Cross-Language Retrieval of Comparable Corpora Through Thesaurus-Based Translation and Citation Indexing

by Gatlin, Keith A.

Abstract (Summary)
This paper studies methods to enhance cross-language retrieval of domain-specific

documents. English- and German-language comparable corpora are used as the subject

of the study. A multilingual thesaurus is developed to facilitate query translation,

and reference citations are indexed to provide a language-neutral method to retrieve

documents. These new retrieval methods are tested against actual user queries to measure the improvement of retrieval quality over an existing Boolean system. Experimental results suggest that a manually produced thesaurus can greatly increase the recall of documents, while the use of the citation index leads to high precision retrieval when compared to a standard Boolean system without these enhancements. Both methods provide cross-language retrieval of documents given monolingual search terms, thus automatically expanding the scope of a user's query.

Bibliographical Information:

Advisor:Robert Losee

School:University of North Carolina at Chapel Hill

School Location:USA - North Carolina

Source Type:Master's Thesis

Keywords:information retrieval – cross language comparable corpora thesaurus compilation citation indexing


Date of Publication:04/06/2005

© 2009 All Rights Reserved.