Model Based Learning and Reasoning from Partially Observed Data
Abstract (Summary)
Management of data imprecision has become increasingly important, especially with
the advance of technology enabling applications to collect and store huge amount data from
multiple sources. Data collected in such applications involve a large number of variables
and various types of data imperfections. These data, when used in knowledge discovery
applications, require the following: 1) computationally efficient algorithms that works faster
with limited resources, 2) an effective methodology for modeling data imperfections and
3) procedures for enabling knowledge discovery and quantifying and propagating partial or
incomplete knowledge throughout the decision-making process.
Bayesian Networks (BNs) provide a convenient framework for modeling these applications
probabilistically enabling a compact representation of the joint probability distribution
involving large numbers of variables. BNs also form the foundation for a number of computationally
efficient algorithms for making inferences. The underlying probabilistic approach
however is not sufficiently capable of handling the wider range of data imperfections that
may appear in many new applications (e.g., medical data). Dempster-Shafer theory on the
other hand provides a strong framework for modeling a broader range of data imperfections.
However, it must overcome the challenge of a potentially enormous computational burden.
In this dissertation, we introduce the joint Dirichlet BoE, a certain mass assignment in
the DS theoretic framework, that simplifies the computational complexity while enabling
one to model many common types of data imperfections. We first use this Dirichlet BoE model to enhance the performance of the EM algorithm used in learning BN parameters
from data with missing values.
To form a framework of reasoning with the Dirichlet BoE, the DS theoretic notions of
conditionals, independence and conditional independence are revisited. These notions are
then used to develop the DS-BN, a BN-like graphical model in the DS theoretic framework,
that enables a compact representation of the joint Dirichlet BoE. We also show how one
may use the DS-BN in different types of reasoning tasks. A local message passing scheme
is developed for efficient propagation of evidence in the DS-BN. We also extend the use
of the joint Dirichlet BoE to Markov models and hidden Markov models to address the
uncertainty arising due to inadequate training data. Finally, we present the results of
various experiments carried out on synthetically generated data sets as well as data sets
from medical applications.
Bibliographical Information:
Advisor:Subramanian Ramakrishnan; Kamal Premaratne; Michael Scordilis; Mei-Ling Shyu; Manohar N. Murthi
School:University of Miami
School Location:USA - Florida
Source Type:Master's Thesis
Keywords:electrical and computer engineering
ISBN:
Date of Publication:06/09/2008