With the advent of more and more complex software architecture for high-performance computing, code complexity is becoming an urgent issue. Two trends emerge in this context. On the one hand, simulations implemented as a workflow involving multiple codes would benefit from improved data exchange to overcome the file-system communications performance bottleneck. On the other hand, complex monolithic simulation codes would benefit from improved modularization for maintainability and to make it possible to choose the best suited programming model for each part of the code.
One interesting direction to answer these needs is to design application as efficient loose coupling of independent modules. Traditionally, performance-oriented coupling solutions such as scientific workflows and software components have put an emphasis on control-driven coupling where connections between interacting modules have to be defined by hand. The complexity of these approach means that many code couplings still rely on the file-system for interactions.
Distributed programming models in the field of data analytics have adopted a different approach where most interactions are defined by the data themselves. Typically in the Map-Reduce model, the reducer handling a piece of data is not explicitly specified but implicitly by a key included in the data. The PDI library developed at MdlS offers a comparable approach where coupling is achieved by sharing access to a logical data store and reacting to changes in this store. This library is however limited to process-local interactions.
The goal of the proposed Post-Doc position is to evaluate and implement a solution to support data-driven coupling at the scale of a complete Exascale machine. In order to achieve this goal, the candidate will have to take into account problems that arise at this scale, including but not limited to the following. Data distributions and re-distributions between modules will have to be handled. Achieving good performance while taking into account data transfer times will require asynchronous solutions. Unique values identification in the presence of asynchronism will require additional information, for example to handle values from different time-steps.