Details

SAWTOOTH [electronic resource] : learning from huge amounts of data /

by Orrego, Andre?s Sebastia?n.

Abstract (Summary)
SAWTOOTH: Learning from Huge Amounts of Data Andrés Sebastián Orrego Data scarcity has been a problem in data mining up until recent times. Now, in the era of the Internet and the tremendous advances in both, data storage devices and high-speed computing, databases are filling up at rates never imagined before. The machine learning problems of the past have been augmented by an increasingly important one, scalability. Extracting useful information from arbitrarily large data collections or data streams is now of special interest within the data mining community. In this research we find that mining from such large datasets may actually be quite simple. We address the scalability issues of previous widely-used batch learning algorithms and discretization techniques used to handle continuous values within the data. Then, we describe an incremental algorithm that addresses the scalability problem of Bayesian classifiers, and propose a Bayesian-compatible on-line discretization technique that handles continuous values, both with a “simplicity first” approach and very low memory (RAM) requirements. To my family. To Nana. iii iv
Bibliographical Information:

Advisor:

School:West Virginia University

School Location:USA - West Virginia

Source Type:Master's Thesis

Keywords:data mining machine learning artificial intelligence algorithms

ISBN:

Date of Publication:

© 2009 OpenThesis.org. All Rights Reserved.