Learning based organizational approaches for peer-to-peer based information retrieval systems

by Zhang, Haizheng

Abstract (Summary)
Over the past few years, computer scientists have been very interested in peer-to-peer based information retrieval systems. But while such applications are promising, the underlying technology is challenging because it is difficult to direct users' queries to ideal destinations effectively and efficiently in the absence of complete up-to-date information about other nodes' states in the network. In addition, the presence of concurrent search sessions adds another level of complication: bandwidth and capacity limitations may prevent nodes from promptly forwarding and performing local searches for all queries received. This thesis frames a peer-to-peer information retrieval(P2P IR) problem as a multi-agent framework and attacks it from an organizational perspective by exploring various adaptive, self-organizing topological organizations, designing appropriate coordination strategies, and exploiting learning techniques to create more accurate routing policy for large-scale agent organizations. Specifically, two protocols have been designed to create semantic-based implicitly-clustered agent organizations and explicit multi-level hierarchical agent organizations respectively. Several coordination strategies are also proposed to direct distributed search sessions by taking advantage of agents' degree, similarity information. Furthermore, in order to handle multiple concurrent search sessions in the system, an agent control mechanism is proposed to engineer the query flow in the entire network based only on agents' local observations of network traffic and agent loading so as to improve the mean effective propagation speed of search queries. The elements of such a control mechanism include resource selection, local search scheduling and feedback-based load control. In particular, with the feedback-based load control unit, an agent not only considers the capacity of its own communication channels, but also takes into account its neighboring agents' service rate, which is acquired dynamically from its neighboring agents. Based on this novel agent control mechanism, a balanced distributed search algorithm is designed to reduce the potential hot spots in the network. In addition, a reinforcement-learning based approach is developed in this thesis to take advantage of the run-time characteristics of P2P IR systems, including environmental parameters, bandwidth usage, and historical information about past search sessions. In the learning process, agents refine their content routing policies by constructing relatively accurate routing tables based on a Q-learning algorithm. Experimental results show that this learning algorithm considerably improves the performance of distributed search sessions in P2P IR systems.
Bibliographical Information:


School:University of Massachusetts Amherst

School Location:USA - Massachusetts

Source Type:Master's Thesis



Date of Publication:01/01/2006

© 2009 All Rights Reserved.