Direct probability assessment in discriminant analysis

by Lauder, Ian James

Abstract (Summary)
(Uncorrected OCR) 1 Abstract of the thesis entitled "Direct Probability Assessment in Discriminant Analysis" submitted by Ian James Lauder for the degree of Doctor of Philosophy at the University of Hong Kong in July 1985 Probabilistic assessment in discriminant analysis entails estimation of the unknown conditional density p(t|x) for type t (1 < t < k) given feature vector x. The procedure is usually based on a training set D = {(t.,x.): i = l,...,n} with the ith case in D having known type t. and complete precise feature vector x.. In general the components of x can be continuous, discrete or categorical features. In situations such as medical diagnosis, although the conditional densities p(x|t) (1 < t < k) may be severely modified due to selection or truncation, the conditional form p(t|x) remains stable. In such instances density estimation via direct modelling of p(t|x) has theoretical attractions. The thesis develops the direct approach for modelling p(t|x). Parametric and non-parametric models are developed and applied under various assumptions about the type of the distributions and the nature of cases in D. When the feature vectors in D are precise, the widely used parametric class of logistic models for p(t|x) is extended by developing latent variable models for p(t|x). The non-parametric approach to direct modelling in these circumstances is achieved by the development of the weighted kernel density model. The performance of the models is evaluated on a variety of real data sets where precise continuous, discrete and categorical feature 11 vectors are encountered. The logistic, latent variable and kernel modelling procedures are extended to explore a variety of situations that can be encountered in practical application. The modelling procedures are first extended to deal with the situation of imprecision affecting the measurements in the feature vectors. The performance of the models and the effect of imprecision is investigated by application to real and simulated data sets. A second extension is made to the situation of missing values in the feature vectors. Both parametric and non-parametric approaches can be developed in these circumstances. The effect of values missing at random and in non-random from x on p(tjx) is explored. Finally more specialised topics are covered. The modelling process is extended to cope with situations where basic cases can be misclassified or replicated with replication error in typing. The final extension is to the situation where the typing process is subject to uncertainty.
Bibliographical Information:


School:The University of Hong Kong

School Location:China - Hong Kong SAR

Source Type:Master's Thesis

Keywords:discriminant analysis


Date of Publication:01/01/1986

© 2009 All Rights Reserved.