Automatic Classification of Online News Headlines

by Pope, Mark W

Abstract (Summary)
The rise of online news over the past decade has altered how individuals obtain news and

this study sought to determine the types of online news headlines most often selected by

news websites as their “Top Stories”. Headlines from four news websites were

downloaded using Really Simple Syndication (RSS) feeds. Supervised learning was

conducted with the downloaded headlines to develop models which could automatically

classify each website’s “Top Story” headlines, whose specific news category was

unknown. “Top Story” headlines were also matched to headlines with known news

categories from the same period to determine which news categories were most often

represented as “Top Stories”. The results show that some news categories’ headlines,

particularly those that had unique terms, were classified correctly based on the text

contained in the headline. Headlines from World and US/UK news categories were most

often represented as Top Story headlines, followed by Business, Politics, and


Bibliographical Information:

Advisor:Stephanie W. Haas

School:University of North Carolina at Chapel Hill

School Location:USA - North Carolina

Source Type:Master's Thesis

Keywords:automatic text classification mining rss feeds news websites online headlines


Date of Publication:11/10/2007

© 2009 All Rights Reserved.