Stepping Stones and Pathways:Improving Retrieval by Chains of Relationships between Documents

by Das Neves, Fernando Adrian

Abstract (Summary)
The information retrieval (IR) field has been successful in developing techniques to address many types of information needs. However, there are cases in which traditional approaches to IR are not able to produce adequate results. Examples include: when a small set of (2-3) documents is needed as an answer rather than a single document, or when query splitting is required to satisfactorily explore the document space. We explore an alternative model of building and presenting retrieval results for such cases. In particular, we research effective methods for handling information needs that may: 1. Include multiple topics: A typical query is interpreted by current IR systems as a request to retrieve documents that each discusses all topics included in that query. We propose an alternative interpretation based on query splitting. It allows queries to be interpreted as requests to retrieve sets of documents rather than individual documents, with meaningful relationships among the members of each such set. 2. Be interpreted as parts in a chain of relationships: Suppose a query concerns topics t1 and tm. Is there a relation between topics t1 and tm that involves t2 and possibly other topics as in {t1, t2, tm}? Thus, we propose an alternative interpretation of user queries and presentation of the results. Our interpretation has the potential to improve retrieval results whenever there is a mismatch between the users understanding of the collection and the actual collection content. We define and refine a retrieval scheme that enhances retrieval through a framework that combines multiple sources of evidence. Query results in our interpretation are networks of document groups representing topics, each group relating to and connecting to other groups in the network that partially answer the users information need. We devise new and more effective representations and techniques to visualize results, and incorporate the user as part of the retrieval process. We also evaluate the improvement of the query results based on multiple measures. In particular, we verify the validity of our approach through a study involving a collection of Operating Systems research papers that was specially built for this dissertation.
Bibliographical Information:

Advisor:Naren Ramakrishnan; Dennis Kafura; Ron Kriz; Chris North; Edward A. Fox

School:Virginia Polytechnic Institute and State University

School Location:USA - Virginia

Source Type:Master's Thesis

Keywords:computer science


Date of Publication:12/08/2004

© 2009 All Rights Reserved.