High performance parallel approximate eigensolver for real symmetric matrices
Abstract (Summary)
In the first-principles calculation of electronic structures, one of the most timeconsuming
tasks is that of computing the eigensystem of a large symmetric nonlinear
eigenvalue problem. The standard approach is to use an iterative scheme involving the
solution to a large symmetric linear eigenvalue problem in each iteration. In the early and
intermediate iterations, significant gains in efficiency may result from solving the
eigensystem to reduced accuracy. As the iteration nears convergence, the eigensystem
can be computed to the required accuracy.
Traditional real symmetric eigensolvers compute the eigensystem in three steps: 1)
reduce a dense matrix to a symmetric tridiagonal form using orthogonal transformations;
2) compute eigenpairs of the tridiagonal matrix; 3) back-transform eigenvectors of the
tridiagonal matrix to those of the original matrix. Stable and efficient eigendecomposition
algorithms for symmetric tridiagonal matrix are under constant
investigation, while the performance of orthogonal reduction step remains a bottleneck.
The main contribution of this dissertation is an efficient parallel approximate
eigensolver that computes eigenpairs of a real symmetric matrix to reduced accuracy.
This eigensolver consists of three major parts: 1) a parallel block tridiagonal divide-andconquer
algorithm that computes the approximate eigenpairs of a block tridiagonal matrix
to prescribed accuracy; 2) a parallel block tridiagonalization algorithm that constructs a
block tridiagonal matrix from a sparse matrix or “effectively” sparse matrix – matrix with
many small elements that can be regarded as zeros without affecting the prescribed
accuracy of the eigenvalues; 3) a parallel orthogonal block tridiagonal reduction
algorithm that reduces a dense real symmetric matrix to block tridiagonal form using
similarity transformations with a high ratio of level 3 BLAS operations. The parallel
approximate eigensolver chooses a proper combination of the three algorithms depending
on the structure of the input matrix and computes all the eigenpairs of the input matrix to
prescribed accuracy.
Numerical results show that the parallel block tridiagonal divide-and-conquer
algorithm is very efficient when at least a few off-diagonal blocks have a relatively low
iv
rank. With a very low computational cost, the parallel block tridiagonalization algorithm
constructs a block tridiagonal matrix from a sparse or “effectively” sparse input matrix.
The parallel orthogonal block tridiagonal reduction algorithm achieves high performance
due to high ratio of level 3 BLAS operations. Using a small block size for the parallel
orthogonal block tridiagonal reduction algorithm is a critical factor for competitive
performance when combined with the parallel block tridiagonal divide-and-conquer
algorithm.
Our parallel approximate eigensolver has the limitation that the block tridiagonal
matrices, either as the input matrices or after pre-processing steps, should have offdiagonal
blocks with low rank, say 20 or less, or a very high ratio of deflation to achieve
satisfactory performance. In addition, large variation in deflation rate may lead to
workload imbalance, although such cases appear to be rare. Future work may include a
complete data parallel implementation of the block tridiagonal divide-and-conquer
algorithm and a parallel adaptive eigensolver that detects matrix structure automatically,
adjusts the accuracy requirement when necessary and chooses the proper algorithms to
solve the eigenproblem.
v
Bibliographical Information:
Advisor:
School:The University of Tennessee at Chattanooga
School Location:USA - Tennessee
Source Type:Master's Thesis
Keywords:
ISBN:
Date of Publication: