Multiple evidence combination in information retrieval
This thesis investigates the applicability of multiple evidence combination to the task of textual Information Retrieval (IR). IR is the retrieval of documents that are relevant to a specified information need. Many different IR models exist that provide methods for solving this high-level task. Within these models, many different techniques are used. By combining evidence provided by multiple evidential sources, an IR system can take advantage of complementing techniques.Using the AFFAIR programming framework for IR, which is designed to facilitate multiple evidence combination, a variety of configurable components typical for a Vector-based IR system have been developed. By developing an evaluation framework using the Text REtrieval Conferences (TREC) methodology and resources for IR evaluation, and integrating this with AFFAIR, a number of multiple evidence combination experiments with the developed components could be performed.Experiments were chosen to highlight interesting questions regarding the optimal configuration of a vector-based IR system, and to include a wide range of different possible system configurations. Two questions that were investigated using multiple evidence combination, are:• How do different term weighting techniques compare and combine?• How does stemming influence retrieval quality?The experiments followed a strategy of testing all possible pair wise combinations of a set of baseline configurations.The results of the experiments were not only able to provide interesting answers to these questions, but also verified the validity of the implemented evaluation framework. The baseline performances produced by these experimental search components were comparable to the average performance of participants in the Eighth Text REtrieval Conference, TREC-8. The results can be used to develop integrated techniques taking advantage of synergy effects discovered by the experiments, or the combinations used in the experiments can be reproduced by a non-experimental IR system.Based on this it can be said that multiple evidence combination is applicable to the IR problem, and that the developed framework is a valuable tool for research in this area. A strategy of exhaustively testing all possible combinations of typical techniques can be applied at a larger scale including several different IR models. In the future, this can hopefully provide further interesting results.