ChASE is a modern and scalable library to solve dense Hermitian (Symmetric) algebraic eigenvalue problems based on a spectral polynomial filter. The library is fully parallelized, and is particularly effective for sequences of eigenproblems as they often arise in electronic structure theory.
Real and Complex: ChASE is templated for real and complex numbers. So it can be used to solve real symmetric eigenproblems as well as complex Hermitian ones.
Eigespectrum: ChASE algorithm is designed to solve for the extremal portion of the eigenspectrum. By default it computes the lowest portion of the spectrum but it can compute as well the largest portion by solving for -A. The library is particularly efficient when no more than 20% of the extremal portion of the eigenspectrum is sought after. For larger fractions the subspace iteration algorithm may struggle to be competitive. Converge could become an issue for fractions close to or larger than 50%.
Type of Problem: ChASE can currently handle only standard eigenvalue problems. Generalized eigenvalue problems of the form A\hat{x} = \lambda B \hat{x}, with B s.p.d., can be solved after factorizing B = L L^T and transforming the problem into standard form A = L^{-1} A L^{-T}.
Sequences: ChASE is particularly efficient when dealing with sequences of eigenvalue problems , where the eigenvectors solving for one problem can be use as input to accelerate the solution of the next one.
Vectors input: Since it is based on subspace iteration, ChASE can receive as input a matrix of vector V equal to the number of desired eigenvalues. ChASE can experience substantial speed-ups when V} contains some information about the sought after eigenvectors.
Degree optimization: For a fixed accuracy level, ChASE can optimize the degree of the Chebyshev polynomial filter so as to minimize the number of FLOPs necessary to reach convergence.
Precision: ChASE is also templated to work in Single Precision (SP) or Double Precision (DP).
Currently, the library consists of one major part, labelled ChASE-MPI, for solving dense eigenproblems. There will be another major part to support sparse eigenproblems in short future. ChASE-MPI can be installed with the minimum amount of dependencies (BLAS, LAPACK, and MPI).
ChASE-MPI supports different configurations depending on the available hardware resources.
Shared memory build: This is the simplest configuration and should be exclusively selected when ChASE is used on only one computing node or on a single CPU. The simplicity of this configuration resides in the way the Matrix-Matrix kernel is implemented with respect to the full MPI build.
MPI+Threads build: On multi-core homogeneous CPU clusters ChASE is best used in its pure MPI build. In this configuration, ChASE is typically used with one MPI rank per NUMA domain and as many threads as number of available cores per NUMA domain.
GPU build: ChASE-MPI can be configured to take advantage of graphics card on heterogeneous computing clusters. Currently we support one GPU card per MPI rank.
In ChASE-MPI, the MPI nodes are constructed as 2D grid, two data distributions are support to assigned sub-blocks of dense matrix A into different MPI nodes.
The first is called Block Distribution, in which each MPI rank of 2D grid is assigned a block of dense matrix A. The most important kernel of ChASE is the Hermitian Matrix-Matrix Multiplication. This block data distribution results in a matrix-matrix multiplications on each node that is large and contiguous, often resulting in a performance close to the hardware theoretical peak. In addition, this data distribution allows an easy offloading of the multiplication to accelerators such as GPUs.
The second is called 2D Block-Cyclic Distribution. This distribution scheme was introduced for the implementation of dense matrix computations on distributed-memory machines. Compared to the Block Distribution, the main advantage of the Block-Cyclic Distribution is improving the load balance of matrix computation if the amount of work differs for different entries of a matrix, e.g., QR and LU factorization. A block distribution can lead to load imbalances.
Even the load balance is not a problem for ChASE, in which the most important kernel Hermitian Matrix-Matrix Multiplication is well balanced with the Block Distribution, we still provide the Block-Cyclic Distribution as an option
in ChASE to avoid the re-distribution between these two types of distributions, which might be required for some application, e.g., solving generalized eigenproblem by ChASE with Cholesky factorization. In ChASE, its implementation with Block-Cyclic Distribution can achieve similar performance as the implementation with Block Distribution.