Algorithms to identify Pareto points in multi-dimensional data sets
Abstract (Summary)
The focus in this research is on developing a fast, efficient hybrid algorithm to identify the Pareto frontier
in multi-dimensional data sets. The hybrid algorithm is a blend of two different base algorithms, the Simple
Cull (SC) algorithm that has a low overhead but is of overall high computational complexity, and the
Divide
&
Conquer (DC) algorithm that has a lower computational complexity but has a high overhead. The
hybrid algorithm employs aspects of each of the two base algorithms, adapting in response to the properties
of the data.
Each of the two base algorithms perform better for different classes of data, with the SC algorithm
performing best for data sets with few nondominated points, high dimensionality, or fewer total numbers of
points, while the DC algorithm performs better otherwise. The general approach to the hybrid algorithm is
to execute the following steps in order:
1. Execute one pass of the SC algorithm through the data if merited
2. Execute the DC algorithm, which recursively splits the data into smaller problem sizes
3. Switch to the SC algorithm for problem sizes below a certain limit
In order to determine whether Step 1 should be executed, and to determine at what problem size the switch
in Step 3 should be made, estimates of both algorithms’ run times as a function of the size of the data set,
the dimension of the data set, and the expected number of nondominated points are needed. These are
developed in the thesis.
To aid in increasing the speed and reducing the computational and storage complexity of the algorithm, and
to enable the ability for the algorithm to adapt to the data, a canonical transformation of the data to a Lattice
Latin Hypercube (LLH) form is developed. The transformation preserves the Pareto property of points but
reduces storage space and algorithm run time.
In order to test the three algorithms, three different methods for creating randomized data sets with arbitrary
dimensionality and numbers of nondominated points are developed. Each of the methods provides insight
into the properties of nondominated sets, along with providing test cases for the algorithms. Additionally, a
spacecraft design problem is developed to serve as a source of real world test data.
iii
Bibliographical Information:
Advisor:
School:Pennsylvania State University
School Location:USA - Pennsylvania
Source Type:Master's Thesis
Keywords:
ISBN:
Date of Publication: