Enriching the Digital Library Experience: Innovations with Named Entity Recognition and Geographic Information System Technologies
Digital libraries are seeking innovative ways to share their resources and enhance user experience. To this end, numerous openly available technologies can be exploited. For this project, NER technology was applied to a subset of the Documenting the American South (DocSouth) digital collections. Personal and location names were hand-annotated to achieve a gold standard, and GATE, a text engineering tool, was run under two conditions: a defaults baseline and a test run that included gazetteers built from DocSouth's Colonial and State Records collection. Overall, GATE performance is promising, and numerous strategies for improvement are discussed. Next, derived location annotations were georeferenced and stored in a geodatabase through automated processes, and a prototype for a web-based map search was developed using the Google Maps API. This project showcases innovations with automated NER coupled with GIS technologies, and strongly supports further investment in applying these techniques across DocSouth and other digital libraries.
Advisor:Hugh A. Cayless
School:University of North Carolina at Chapel Hill
School Location:USA - North Carolina
Source Type:Master's Thesis
Keywords:data mining digital libraries geographic information systems retrieval world wide web
Date of Publication:07/21/2008