Context
High-fidelity simulations of turbulent compressible flows in aerodynamics usually imply the numerical analysis of three-dimensional flows around complex geometries. With the physical modelling of phenomena of increasing complexity and the industrial demand for accurate solutions, the linear systems are often very large, sparse and ill-conditioned. These systems could arise from typical Computational Fluid Dynamics (CFD) applications such as fixed-point iterations or adjoint-state methods which are usually employed in optimization, linear analysis or data assimilation topics. Robust and efficient parallel strategies are then essential and must be capable to deliver solutions at a prescribed error tolerance for system sizes now approaching one billion of unknowns. In view of the memory constraints and the spectral properties of the considered operator, iterative inner-outer Krylov solvers turned out to be quite promising approaches subject to the use of an adequate preconditioning operator. To leverage CPU and memory costs, several directions are investigated such as deflation techniques [1] to recycle spectral information into the next Krylov subspace at each solver restart or preconditioning operators where local ILU-type preconditioners [2] are selected accordingly to the local subdomain stiffness and coupling between subdomains are ensured by the Restricted Additive Schwarz (RAS) method. This last direction lies at the heart of this PhD project.Objectives
The initial domain is usually partitioned into subdomains, each being taken care of by one MPI process. The number of subdomains is above all driven by the CFD needs and the engineer expertise and fixes the global amount of memory for the simulation. We aim to introduce a second MPI parallelization paradigm devoted to the solution of linear systems: larger subdomains for local approximate direct solvers might improve the numerical efficiency of the global hybrid direct-iterative approach. On top of that, as the memory is constrained, adapting the local preconditioner quality to the subdomain stiffness might be beneficial. To tackle factorization costs of direct solvers, a data compression technique will be used and to handle non-uniform costs per subdomain, load-balancing techniques will be employed. Several areas will also be studied: a posteriori estimators of subdomain stiffness to adapt accuracy of approximate direct solvers, expected compression ratio varying subdomains size, all means to reduce memory footprint, efficiency and robustness of such a global approach.Key steps
Matrices arising from elliptic Partial Differential Equations (PDEs) have shown to have a low-rank property and this possible characteristic is now efficiently exploited within the direct multifrontal solver MUMPS [3] to provide a substantial reduction of its complexity in terms of floating-point operations and memory requirement. Among the possible low-rank formats, MUMPS uses the Block Low-Rank (BLR) format [3]. Preliminary numerical experiments exploiting the BLR multifrontal factorization per local subdomain has brought promising gains on first CFD applications where matrices come from hyperbolic PDEs this time. The first step will consist in using MUMPS with the CFD ONERA code SoNICS to build a uniform BLR-LU solver per subdomain while preserving the initial partitioning of the mesh. The second step will focus on the conception and development of a second MPI parallelization strategy through the ParaDiGM library used by SoNICS to deliver a partitioning better-suited to linear systems. The last step will look at the definition of load-balancing techniques to address non-uniform costs. Numerical experiments on challenging test cases will be performed at the end of each step to assess the potential and limitations of the current approach.We are looking for a motivated candidate with a background in applied mathematics or computer science. A keen interest in numerical linear algebra and programming (C++, Fortran, Python) would be welcome.
[1] M. Jadoui, C. Blondeau, E. Martin, F. Renac, F.-X. Roux, Comparative study of inner-outer Krylov solvers for linear systems in structured and high-order unstructured CFD problems, Computers & Fluids, 244, 2022. (arXiv:2404.17870)
[2] Y. Saad, Iterative Methods for Sparse Linear Systems, Second Edition, SIAM, 2003.
[3] P. Amestoy, A. Buttari, J.-Y. L'Excellent, T. Mary, Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures, ACM TOMS, 45 (1), 2019. ⟨hal-01955766v2⟩