Database selection in distributed information retrieval a study of multi-collection information retrieval /
Abstract (Summary)
The proliferation of online information resources increases the importance of e ective and
e
cient information retrieval in a multi-collection environment. Multi-collection searching
includes distributed searching as a special case but is more broadly de ned here to incorporate
searching partitioned content independently from its physical storage. It is cast
in three parts: collection selection (also referred to as database selection) { decide where
should a query be sent query processing { execute the query at each selected collectionÂ
and results merging { combine the results from individual collections into a single coherent
list for the searcher. We focus our attention on collection selection.
We compare a number of di erent collection selection approaches and examine the effect
of collection selection on document retrieval performance. We consider multi-collection
retrieval in six di erent test environments utilizing three document testbeds. Considering
collection selection in isolation, we nd that e ective collection selection can be achieved
using limited information about each collection. We then turn our attention from selection
alone to data item retrieval in a multi-collection environment, considering retrieval performance
in the same six test environments. First we nd that good collection selection has
the potential to result in better retrieval e ectiveness than can be achieved in an equivalent
single collection. Second we nd that good performance can be achieved when only a few
collections are selected and that the performance generally increases as more collections are
selected. Finally we nd that when collection selection is employed, it may not be necessary
to maintain collection wide information (CWI), e.g., global idf. Local information can be
used to achieve equivalent performance. This means that multi-collection systems can be
iv
v
engineered with more autonomy and less cooperation. This work demonstrates that improvements
in collection selection can lead to broader improvements in document retrieval
performance.
Bibliographical Information:
Advisor:
School:University of Virginia
School Location:USA - Virginia
Source Type:Master's Thesis
Keywords:
ISBN:
Date of Publication: