AUTOMATIC EXTRACTION OF AUTHOR SELF CONTRIBUTED METADATA FOR ELECTRONIC THESES AND DISSERTATIONS
This paper discusses the design and implement of an automatic way to extract the metadata from PDF files in the process of the submission to the Electronic and Theses Dissertations (ETDs). During the submission, each ETDs system requires some metadata about the theses to facilitate the metadata search after it is archived. Those metadata, like creator, title, data, abstract, subject and publisher, comply with the Dublin Core Metadata Initiative. In most of all existing ETDs repositories, students are required to manually type in these metadata, which discourages students' submission, especially when resubmissions are needed due to the errors found in the theses, because they have to type all the metadata again each time they submit the theses.
By standardizing a method for capturing the metadata from the original documents, our project aims to enable digital repository, which hosts the ETDs collection, to automatically extract the metadata from the theses, making the submissions much easier and more convenient for the students.
Advisor:Bradley M. Hemminger
School:University of North Carolina at Chapel Hill
School Location:USA - North Carolina
Source Type:Master's Thesis
Keywords:metadata automatic extraction author self contributed digital library electronic theses and dissertations etds
Date of Publication:05/04/2004