Evaluation of clusterings of gene expression data
Recent literature has investigated the use of different clustering techniques for analysis of gene expression data. For example, self-organizing maps (SOMs) have been used to identify gene clusters of clear biological relevance in human hematopoietic differentiation and the yeast cell cycle (Tamayo et al., 1999). Hierarchical clustering has also been proposed for identifying clusters of genes that share common roles in cellular processes (Eisen et al., 1998; Michaels et al., 1998; Wen et al., 1998). Systematic evaluation of clustering results is as important as generating the clusters. However, this is a difficult task, which is often overlooked in gene expression studies. Several gene expression studies claim success of the clustering algorithm without showing a validation of complete clusterings, for example Ben-Dor and Yakhini (1999) and Törönen et al. (1999).In this dissertation we propose an evaluation approach based on a relative entropy measure that uses additional knowledge about genes (gene annotations) besides the gene expression data. More specifically, we use gene annotations in the form of an enzyme classification hierarchy, to evaluate clusterings. This classification is based on the main chemical reactions that are catalysed by enzymes. Furthermore, we evaluate clusterings with pure statistical measures of cluster validity (compactness and isolation).The experiments include applying two types of clustering methods (SOMs and hierarchical clustering) on a data set for which good annotation is available, so that the results can be partly validated from the viewpoint of biological relevance.The evaluation of the clusters indicates that clusters obtained from hierarchical average linkage clustering have much higher relative entropy values and lower compactness and isolation compared to SOM clusters. Clusters with high relative entropy often contain enzymes that are involved in the same enzymatic activity. On the other hand, the compactness and isolation measures do not seem to be reliable for evaluation of clustering results.
School:Högskolan i Skövde
Source Type:Master's Thesis
Keywords:gene expression analysis evaluation of clusterings self organizing maps average linkage clustering annotations
Date of Publication:01/11/2008