Inference of nonparametric hypothesis testing on high dimensional longitudinal data and its application in DNA copy number variation and micro array data analysis

by Zhang, Ke

Abstract (Summary)
High throughput screening technologies have generated a huge amount of biological data in the last ten years. With the easy availability of array technology, researchers started to investigate biological mechanisms using experiments with more sophisticated designs that pose novel challenges to statistical analysis. We provide theory for robust statistical tests in three flexible models. In the first model, we consider the hypothesis testing problems when there are a large number of variables observed repeatedly over time. A potential application is in tumor genomics where an array comparative genome hybridization (aCGH) study will be used to detect progressive DNA copy number changes in tumor development. In the second model, we consider hypothesis testing theory in a longitudinal microarray study when there are multiple treatments or experimental conditions. The tests developed can be used to detect treatment effects for a large group of genes and discover genes that respond to treatment over time. In the third model, we address a hypothesis testing problem that could arise when array data from different sources are to be integrated. We perform statistical tests by assuming a nested design. In all models, robust test statistics were constructed based on moment methods allowing unbalanced design and arbitrary heteroscedasticity. The limiting distributions were derived under the nonclassical setting when the number of probes is large. The test statistics are not targeted at a single probe. Instead, we are interested in testing for a selected set of probes simultaneously. Simulation studies were carried out to compare the proposed methods with some traditional tests using linear mixed-effects models and generalized estimating equations. Interesting results obtained with the proposed theory in two cancer genomic studies suggest that the new methods are promising for a wide range of biological applications with longitudinal arrays.
Bibliographical Information:


School:Kansas State University

School Location:USA - Kansas

Source Type:Master's Thesis

Keywords:high dimensional data longitudinal analysis nonparametric inference hypothesis testing dna copy number variation biology biostatistics 0308 statistics 0463


Date of Publication:01/01/2008

© 2009 All Rights Reserved.