SAWTOOTH [electronic resource] : learning from huge amounts of data /
Abstract (Summary)
SAWTOOTH: Learning from Huge Amounts of Data
Andrés Sebastián Orrego
Data scarcity has been a problem in data mining up until recent times. Now,
in the era of the Internet and the tremendous advances in both, data storage
devices and high-speed computing, databases are filling up at rates never
imagined before. The machine learning problems of the past have been augmented
by an increasingly important one, scalability. Extracting useful
information from arbitrarily large data collections or data streams is now of
special interest within the data mining community. In this research we find
that mining from such large datasets may actually be quite simple. We address
the scalability issues of previous widely-used batch learning algorithms
and discretization techniques used to handle continuous values within the
data. Then, we describe an incremental algorithm that addresses the scalability
problem of Bayesian classifiers, and propose a Bayesian-compatible
on-line discretization technique that handles continuous values, both with a
“simplicity first” approach and very low memory (RAM) requirements.
To my family.
To Nana.
iii
iv
Bibliographical Information:
Advisor:
School:West Virginia University
School Location:USA - West Virginia
Source Type:Master's Thesis
Keywords:data mining machine learning artificial intelligence algorithms
ISBN:
Date of Publication: