Investigation into Regression Analysis of Multivariate Additional Value and Missing Value Data Models Using Artificial Neural Networks and Imputation Techniques
Missing data or insufficient data is a major concern in statistical analysis, especially when the problem to be addressed is related to prediction. The present study is a quantitative analysis of data based on a 'regression technique' for the modeling and prediction of three different numerical data sets, named: Boston Housing, Saginaw Bay,and Reportable Outages. The goal is to enhance the regression results by using an artificial neural network to either impute missing data or supplement existing data using non-linear model. The present prediction problem focuses on developing a hybrid model to create additional data using a combination of model selection techniques such as cross validation estimate and regularization theory in order to reduce the effect of over fitting. The student model is constructed using the knowledge extracted from the teacher model. The knowledge is in the form of learning obtained through an Artificial Neural Network training of the data set/s. The learning derived from the training is also used to build a reverse engineered model to address the missing value data problem.
In the case of Reportable Outages, the student model achieved is the closest nonlinear model to the teacher model and succeeded in enhancing the regression analysis for the prediction problem. The Boston Housing student model contains a significant amount of correlation among the variables, which need additional data relational techniques to address the correlation. The enhancement of linear regression for the case of prediction for Saginaw Bay data is limited to nature and its complex processes.
The overall results achieved are encouraging and show promise for developing a model to create a needed data when data is highly correlated. More data cases need to be investigated using the reverse engineering technique of the Artificial Neural Network for predicting missing value data.
School Location:USA - Ohio
Source Type:Master's Thesis
Keywords:artificial neural network data addition and knowledge extraction reverse engineered correlation breakdown
Date of Publication:01/01/2008