Characterizations of pyramids and their generalizations
Abstract (Summary)Cluster Analysis is a collection of techniques whose goals are to try and suggest possible internal structures of a data set. It is a subfield of exploratory data analysis in which the goal is to find a starting point to investigate some collection of objects. A clustering technique takes a finite data set E with finitely many attributes or a collection of measurements called a dissimilarity coefficient and produces a single classification or a nested sequence of classifications of E. When one forms a nested sequence of partitions on the given set it is easily visualized as a hierarchy. Pyramids, developed by Diday (12), allow visual representation of output that has some overlap. It is a well known fact that weakly indexed pyramids are in one-to-one correspondence with definite Robinsonian dissimilarity coefficients. Pyramids allow some overlap between clusters. One drawback to pyramidal representations is the requirement that one must impose a linear order on the underlying set to be clustered. It will be shown that by examining a dissimilarity coefficient one is able to determine its compatible linear orders, if any, using the consecutive ones property. A generalization of pyramids, pseudo-pyramids, will be introduced. The concepts of weakly indexed and indexed pseudo-pyramids are constructed. Pyramids and their generalizations will be placed in the ordinal model developed by Janowitz (25). Characterizations of pyramids and their generalizations are given from set-theoretical, graph-theoretical, and lattice-theoretical viewpoints. In particular, a characterization of indexed pseudo-pyramids with respect to a collection of planar lattices will be introduced. Generalizations of dissimilarity coefficients called pseudo-dissimilarity coefficients will be given. A bijection between indexed (weakly indexed) pseudo-pyramids and strongly Robinsonian (Robinsonian) pseudo-dissimilarities is possible. This generalization removes the necessity of the minimal value on a dissimilarity being 0. Also, the output of a clustering technique using a pseudo-dissimilarity need not be reflexive at each level. In other words, it is not necessary to have all singleton subsets in the classifications.
School Location:USA - Massachusetts
Source Type:Master's Thesis
Date of Publication:01/01/1998