Automatic Classification of Online News Headlines
this study sought to determine the types of online news headlines most often selected by
news websites as their “Top Stories”. Headlines from four news websites were
downloaded using Really Simple Syndication (RSS) feeds. Supervised learning was
conducted with the downloaded headlines to develop models which could automatically
classify each website’s “Top Story” headlines, whose specific news category was
unknown. “Top Story” headlines were also matched to headlines with known news
categories from the same period to determine which news categories were most often
represented as “Top Stories”. The results show that some news categories’ headlines,
particularly those that had unique terms, were classified correctly based on the text
contained in the headline. Headlines from World and US/UK news categories were most
often represented as Top Story headlines, followed by Business, Politics, and
Entertainment.
Advisor:Stephanie W. Haas
School:University of North Carolina at Chapel Hill
School Location:USA - North Carolina
Source Type:Master's Thesis
Keywords:automatic text classification mining rss feeds news websites online headlines
ISBN:
Date of Publication:11/10/2007