Development of high performance code often necessitates compromise. Lowered computational cost comes at the expense of other worthy goals, including intelligibility and maintainability of programs. We discuss a "ladder" of tools ranging from low- to high-level, aiming to reduce these sacrifices in machine independence, readability, and separation of concerns, all while enabling "near-handwritten" performance to be attained. These tools include PyOpenCL, offering access to the OpenCL compute abstraction and open-source implementations thereof, including pocl. They include loopy, a polyhedrally-based code transformation tool for CPU and GPU code. And they include pytato, which captures data flow graphs of numpy-compatible array computations for transformation and processing via loopy. Moderate-to-large-scale examples illustrate successful uses of these tools.
- Andreas Klöckner (University of Illinois)