Details

EMAIL AND PHONE NUMBER ENTITY SEARCH AND RANKING

by Hao, Shuang

Abstract (Summary)
Entity search has been proposed as a search method for domain-specific Internet applications. It differs from the classical approaches used by search engines which give a “page-view result”: listing the URLs of web pages containing the desired keywords. Entity search returns more structured results listing the specific information that a user seeks, such as an email address or a phone number. It not only provides the URL links to targets, but also attributes of target entities (e.g., email address, phone number, etc.). Compared to classical search methods, entity search is a more direct and user-friendly method for searching through a large volume of web documents. After the user submits a query, the extracted entities are ordered by their relevance to the query. While previous work has proposed various complex formulas for entity ranking, it has not been shown whether such complexity is needed. In this research I explore the problem of whether a simpler method can achieve reasonable results. I have designed an entity-search and ranking algorithm using a formula that simply combines a page’s PageRank and an entity’s distance to the query keywords to produce a metric for ranking discovered entities. My research goal is to answer the question of whether effective entity ranking can be performed by an algorithm that computes matching scores specific to the entity search domain, and what improvements are necessary to refine the result. My approach takes into account the entity’s proximity to the keywords in the query as well as the quality of the page where it is contained. I implemented a system based on the algorithm and perform experiments to show that in most cases the result is consistent with the user’s desired outcome.
Bibliographical Information:

Advisor:

School:Kansas State University

School Location:USA - Kansas

Source Type:Master's Thesis

Keywords:entity search ranking computer science 0984

ISBN:

Date of Publication:01/01/2008

© 2009 OpenThesis.org. All Rights Reserved.