Database selection in distributed information retrieval a study of multi-collection information retrieval /

by Powell, Allison L.

Abstract (Summary)
The proliferation of online information resources increases the importance of e ective and e cient information retrieval in a multi-collection environment. Multi-collection searching includes distributed searching as a special case but is more broadly de ned here to incorporate searching partitioned content independently from its physical storage. It is cast in three parts: collection selection (also referred to as database selection) { decide where should a query be sent query processing { execute the query at each selected collection and results merging { combine the results from individual collections into a single coherent list for the searcher. We focus our attention on collection selection. We compare a number of di erent collection selection approaches and examine the effect of collection selection on document retrieval performance. We consider multi-collection retrieval in six di erent test environments utilizing three document testbeds. Considering collection selection in isolation, we nd that e ective collection selection can be achieved using limited information about each collection. We then turn our attention from selection alone to data item retrieval in a multi-collection environment, considering retrieval performance in the same six test environments. First we nd that good collection selection has the potential to result in better retrieval e ectiveness than can be achieved in an equivalent single collection. Second we nd that good performance can be achieved when only a few collections are selected and that the performance generally increases as more collections are selected. Finally we nd that when collection selection is employed, it may not be necessary to maintain collection wide information (CWI), e.g., global idf. Local information can be used to achieve equivalent performance. This means that multi-collection systems can be iv v engineered with more autonomy and less cooperation. This work demonstrates that improvements in collection selection can lead to broader improvements in document retrieval performance.
Bibliographical Information:


School:University of Virginia

School Location:USA - Virginia

Source Type:Master's Thesis



Date of Publication:

© 2009 All Rights Reserved.