Details

Ontology Generation, Information Harvesting and Semantic Annotation for Machine-Generated Web Pages Ontology Generation, Information Harvesting and Semantic Annotation for Machine-Generated Web Pages

by Tao, Cui 1975-

Abstract (Summary)
The current World Wide Web is a web of pages. Users have to guess possible keywords that might lead through search engines to the pages that contain information of interest and browse hundreds or even thousands of the returned pages in order to obtain what they want. This frustrating problem motivates an approach to turn the web of pages into a web of knowledge, so that web users can query the information of interest directly. This dissertation provides a step in this direction and a way to partially overcome the challenges. Specifically, this dissertation shows how to turn machine-generated web pages like those on the hidden web into semantic web pages for the web of knowledge. We design and develop three systems to address the challenge of turning the web pages into web-of-knowledge pages: TISP (Table Interpretation for Sibling Pages), TISP++, and FOCIH (Form-based Ontology Creation and Information Harvesting). TISP can automatically interpret hidden-web tables. Given interpreted tables, TISP++ can generate ontologies and semantically annotate the information present in the interpreted tables automatically. This way, we can offer a way to make the hidden information publicly accessible. We also provide users with a way where they can generate personalized ontologies. FOCIH provides users with an interface with which they can provide their own view by creating a form that specifies the information they want. Based on the form, FOCIH can generate user-specific ontologies, and based on patterns in machine-generated pages, FOCIH can harvest information and annotate these pages with respect to the generated ontology. Users can directly query on the annotated information. With these contributions, this dissertation serves as a foundational pillar for turning the current web of pages into a web of knowledge.
Bibliographical Information:

Advisor:

School:Brigham Young University

School Location:USA - Utah

Source Type:Master's Thesis

Keywords:web of knowledge the semantic ontology generation annotation table interpretation information harvesting sibling pages

ISBN:

Date of Publication:12/15/2008

© 2009 OpenThesis.org. All Rights Reserved.