Textual information retrieval : An approach based on language modeling and neural networks
This thesis covers topics relevant to information organization and retrieval. The main objective of the work is to provide algorithms that can elevate the recall-precision performance of retrieval tasks in a wide range of applications ranging from document organization and retrieval to web-document pre-fetching and finally clustering of documents based on novel encoding techniques.The first part of the thesis deals with the concept of document organization and retrieval using unsupervised neural networks, namely the self-organizing map, and statistical encoding methods for representing the available documents into numerical vectors. The objective of this section is to introduce a set of novel variants of the self-organizing map algorithm that addresses certain shortcomings of the original algorithm.In the second part of the thesis the latencies perceived by users surfing the Internet are shortened with the usage of a novel transparent and speculative pre-fetching algorithm. The proposed algorithm relies on a model of behaviour for the user browsing the Internet and predicts his future actions when surfing the Internet. In modeling the users behaviour the algorithm relies on the contextual statistics of the web pages visited by the user.Finally, the last chapter of the thesis provides preliminary theoretical results along with a general framework on the current and future scientific work. The chapter describes the usage of the Zipf distribution for document organization and the usage of the adaboosting algorithm for the elevation of the performance of pre-fetching algorithms.
Source Type:Doctoral Dissertation
Keywords:SOCIAL SCIENCES; Statistics, computer and systems science; Informatics, computer and systems science; Informatics, computer and systems science; Language modeling; Informatik, data- och systemvetenskap
Date of Publication:01/01/2004