Scalable integration view computation and maintenance with parallel, adaptive and grouping techniques
Abstract (Summary)
Materialized integration views constructed by integrating data from multiple
distributed data sources help to achieve better access, reliable performance,
and high availability for a wide range of applications. In this dissertation,
we propose parallel, adaptive, and grouping techniques to address
scalability challenges in high-performance integration view computation
and maintenance due to increasingly large data sources and high rates of
source updates.
State-of-the-art parallel integration view computation makes the common
assumption that the maximal pipelined parallelism leads to superior
performance. We instead propose segmented bushy parallel processing
that combines pipelined parallelism with alternate forms of parallelism to
achieve an overall more effective strategy. Experimental studies conducted
over a cluster of high-performance PCs confirm that the proposed strategy
has an on average of 50% improvement in terms of total processing time in
comparison to existing solutions.
Run-time adaptation becomes critical for parallel integration view computation
due to its long running and memory intensive nature. We inves-
ii
tigate two types of state level adaptations, namely, state spill and state relocation,
to address the run-time memory shortage. We propose lazy-disk and
active-disk approaches that integrate both adaptations to maximize run-time
query throughput in a memory constrained environment. We also propose
global throughput-oriented state adaptation strategies for computation plans
with multiple state intensive operators. Extensive experiments confirm the
effectiveness of our proposed adaptation solutions.
Once results have been computed and materialized, it’s typically more
efficient to maintain them incrementally instead of full recomputation. However,
state-of-the-art incremental view maintenance require O(n2) maintenance
queries with n being the number of data sources that the view is defined
upon. Moreover, they do not exploit view definitions and data source
processing capabilities to further improve view maintenance performance.
We propose novel grouping maintenance algorithms that dramatically reduce
the number of maintenance queries to (O(n)). A cost-based view
maintenance framework has been proposed to generate optimized maintenance
plans tuned to particular environmental settings. Extensive experimental
studies verify the effectiveness of our maintenance algorithms as
well as the maintenance framework.
iii
Bibliographical Information:
Advisor:
School:Worcester Polytechnic Institute
School Location:USA - Massachusetts
Source Type:Master's Thesis
Keywords:parallel processing electronic computers virtual storage computer science
ISBN:
Date of Publication: