Description

You are in the scientific computing field and you would like to discover FPGA computing ?

You are using FPGAs as electronic component in a processing chain and you would like to discover the latest tools for High Level Synthesis ?

This event, organised by Groupe Calcul and of theme C of GdR ISIS, is about getting feedbacks from those who already experienced FPGAs and put your hands on these devices and the programming models proposed by the two manufacturers: Vitis HLS (AMD-XilinX) et OneAPI (Intel). An ISIS day will close the event with two invited talks and a round table in the morning and talks coming from a call for contributions in the afternoon.

Number of seats for hands-on sessions: 25
Number of seats for the ISIS day: 40
All talks will broadcasted as webinars
Lunches are offered by Maison de la Simulation
Travel expenses for GdR ISIS members are funded by GdR ISIS

Speakers

  • Christophe Alias (INRIA / LIP)
  • Stefano Corda (EPFL)
  • Mickaël Dardaillon (INSA Rennes / IETR)
  • Florent De Dinechin (INSA Lyon)
  • Daouda Diakite (L2S - Université Paris Saclay)
  • William Duluc (MVD-Training)
  • Shan Mignot (Laboratoire Lagrange)
  • Maurizio Paolini (Intel)
  • Charles Prouveur (CEA)
  • Antsa Randriamanantena (CNRS - LAB)
  • Olivier Régnault (AMD-Xilinx dedicated FAE)
  • Xin Wu (Paderborn University)

Program

lundi 04/07

11:30 14:00 Pas de support disponible Pas de résumé disponible

Welcome - lunch

14:00 15:00 Pas de support disponible Pas de résumé disponible

Introduction to FPGA computation and architecture

Florent De Dinechin

15:00 15:30 Pas de support disponible Pas de résumé disponible

Neural networks with HLS (FINN)

Antsa Randriamanantena

15:30 16:00 Pas de support disponible Pas de résumé disponible

Break

16:00 16:30 Pas de support disponible

Co-design for the SKA project

Shan Mignot

Radio-telescopes cover wide frequency bands and make extensive use of interferometry. This leads to the production of large volumes of data and calls for considerable processing. As a consequence of the limited material possibility to store the raw data, the SKA project has decided to incorporate the processing facilities to the telescopes. Two supercomputers, one for each telescope, are hence envisaged to ingest an expected representative flow of 0.77 TB/s and carry out preliminary data reduction tasks both to reduce the volume of data and yield science products. Preconstruction work has led to a concept based on a homogeneous set of nodes processing data on-the-fly or on-demand within a few days. This distinction stems from the need to provide operational feedback and average the computing load which peaks at an estimated 125 PFlops but averages out to 10 PFlops. Generic COTS systems have hitherto been considered to maximise versatility and refrain from specialising software development. However significant risks have been identified concerning cost for procurement and operations. I will present the co-design exercise which is on-going to mitigate this. In this frame, with the advent of high level synthesis, FPGAs with their higher resource utilisation and lower operating frequencies could become an option, notably for on-the-fly tasks, in-networking processing or as accelerators for selected calculation, should the risk/benefit ratio prove favourable.
16:30 17:00 Pas de support disponible

FPGA acceleration of 3D CT reconstruction using OpenCL and oneAPI Tools

Daouda Diakite

Many-core processors such as GPUs are currently the preferred technological target for accelerating HPC applications. However, architectures designed on FPGAs can be interesting alternatives to GPUs because they are potentially lower power and accessible thanks to the new high-level synthesis tools (HLS) provided by the leading manufacturers such as Intel or Xilinx. However, exploiting the full potential of FPGAs via HLS tools requires a deep knowledge of their architecture and a significant effort to match the application to the underlying architecture. In this presentation, I will present the principle of HLS tools as well as a methodology for FPGA acceleration through Intel's OpenCL and OneAPI tools. The 3D back-projection operator, present in iterative tomographic reconstruction algorithms, is considered as a use case for this methodology.
17:00 17:30 Pas de support disponible

Matrix free conjugate gradient with Maxeler Data Flow Engine technology

Charles Prouveur

In this presentation, the implementation of a miniapp extracted from a production code in material science (Metalwalls) using Maxeler technology will be explained, after which a chip to chip comparison between a CPU, a GPU and an FPGA, as well as a scalability study on multiple FPGAs will be presented. The core algorithm is a matrix free conjugate gradient that computes the total electrostatic energy thanks to an Ewald summation at each iteration. The FPGA implementation using 40 bits floating point number representation outperforms the CPU implementation both in terms of computing power and energy usage resulting in an energy efficiency more than 14 times better. Compared to the GPU of the same generation, the FPGA reaches 60% of the GPU performance while the ratio of the performance per watt is still better by a factor of 3. Thanks to its low average power usage, the FPGA bests both fully loaded CPU and GPU in terms of number of conjugate gradient iterations per second and per watt.

mardi 05/07

09:00 10:00 Pas de support disponible

AMD-Xilinx System On Chip (SoC) FPGA, une introduction + démo

Olivier Régnault

Olivier Regnault est « senior expert » pour les architectures FPGA & System on Chip (SoC) et travaille en tant que « Field Application Engineer & Product Line Manager» pour le distributeur de semi-conducteurs European Avnet Silica. L’exposé débutera avec une introduction de l’architecture AMD-Xilinx SoC, avec l’accent sur la gamme ZYNQ Ultrascale+. Suivra une démonstration de développement avec Vivado et Vitis, et, à la fin, une présentation de la nouvelle architecture Versal pour les plateformes de calcul accéléré.
10:00 10:30 Pas de support disponible Pas de résumé disponible

Break

10:30 12:30 Pas de support disponible Pas de résumé disponible

Introduction AMD-XilinX Vitis HLS

William Duluc

12:30 14:00 Pas de support disponible Pas de résumé disponible

Lunch

14:00 15:30 Pas de support disponible Pas de résumé disponible

Hands-on AMD-XilinX Vitis HLS

None

15:30 16:00 Pas de support disponible Pas de résumé disponible

Break

16:00 17:30 Pas de support disponible Pas de résumé disponible

Hands-on AMD-XilinX Vitis HLS

None

mercredi 06/07

09:00 09:30 Pas de support disponible

Dataflow code generation for FPGA

Mickael Dardaillon

High-level synthesis tools for FPGA such as Vitis HLS simplify the development of accelerated applications using high-level C language and combining pre-existing kernels. However connection of dataflow buffers between these kernels still need to be specified and optimized manually by the developer. In this presentation, we introduce a new method and associated tool to generate HLS code from a dataflow graph, and automatically compute buffer sizes to reach the highest throughput.
09:30 10:00 Pas de support disponible Pas de résumé disponible

Work on the accelerated calculation of electron repulsion integrals on FPGAs using oneAPI

Xin Wu

10:00 10:30 Pas de support disponible Pas de résumé disponible

Break

10:30 12:30 Pas de support disponible Pas de résumé disponible

Introduction Intel OneAPI

Maurizio Paolini

12:30 14:00 Pas de support disponible Pas de résumé disponible

Lunch

14:00 15:30 Pas de support disponible Pas de résumé disponible

Hands-on Intel OneAPI

Maurizio Paolini

15:30 16:00 Pas de support disponible Pas de résumé disponible

Break

16:00 17:30 Pas de support disponible Pas de résumé disponible

Hands-on Intel OneAPI

Maurizio Paolini

jeudi 07/07

09:00 09:45 Pas de support disponible Pas de résumé disponible

Welcome coffee

09:45 10:30 Pas de support disponible

Compiling circuits with polyhedra

Christophe Alias

Hardware accelerators are unavoidable to improve the performance of computers with a bounded energy budget. In particular, FPGA allow building dedicated circuits from a gate-level description, allowing a very advanced level of optimization. Tools for high-level synthesis (HLS) allow the programmer to program FPGA without the constraints linked to hardware, compiling a C specification into a circuit. Code optimizations in these tools remain rudimentary (loop unrolling, pipelining, etc.), and most often the responsibility of the programmer. Polyhedral model, born from research on systolic circuits, offer a powerful tool to optimize compute kernels for HPC. In this seminar, I will show a few interconnections between HLS and the polyhedral model, either as a preprocessing (source-to-source) step, or as a synthesis tool (optimizing the circuit using a dataflow intermediate representation). In particular, I will present a dataflow formalism that allow reasoning geometrically on circuit synthesis.
10:30 11:15 Pas de support disponible

Reduced-Precision Acceleration of Radio-Astronomical Imaging on Xilinx FPGAs

Stefano Corda

Modern radio telescopes such as the Square Kilometre Array (SKA) produce large volumes of data that need to be processed to obtain high-resolution sky images. This is a complex task that requires computing systems that provide both high performance and high energy efficiency. Hardware accelerators such as GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays) can provide these two features and are thus an appealing option for this application. Most HPC (High-Performance Computing) systems operate in double precision (64-bit) or in single precision (32-bit), and radio-astronomical imaging is no exception. With reduced precision computing, smaller data types (e.g., 16-bit) aim at improving energy efficiency and throughput performance in noise-tolerant applications. We demonstrate that reduced precision can also be used to produce high-quality sky images. To this end, we analyze the gridding component (Image-Domain Gridding) of the widely-used WSClean imaging application. Gridding is typically one of the most time-consuming steps in the imaging process and, therefore, an excellent candidate for acceleration. We identify the minimum required exponent and mantissa bits for a custom floating-point data type. Then, we propose the first custom floating-point accelerator on a Xilinx Alveo U50 FPGA using High-Level Synthesis. Our reduced-precision implementation improves the throughput and energy efficiency by respectively 1.84x and 2.03x compared to the single-precision floating-point baseline on the same FPGA. Our solution is also 2.12x faster and 3.46x more energy-efficient than an Intel i9 9900k CPU (Central Processing Unit) and manages to keep up in throughput with an AMD RX 550 GPU.
11:15 11:45 Pas de support disponible Pas de résumé disponible

Break

11:45 12:45 Pas de support disponible Pas de résumé disponible

Round table

None

12:45 14:00 Pas de support disponible Pas de résumé disponible

Lunch

14:00 14:30 Pas de support disponible Pas de résumé disponible

Appel à contribution 1

None

14:30 15:00 Pas de support disponible Pas de résumé disponible

Appel à contribution 2

None

15:00 15:30 Pas de support disponible Pas de résumé disponible

Appel à contribution 3

None

15:30 16:00 Pas de support disponible Pas de résumé disponible

Break

16:00 16:30 Pas de support disponible Pas de résumé disponible

Appel à contribution 4

None

16:30 17:00 Pas de support disponible Pas de résumé disponible

Appel à contribution 5

None

17:00 17:30 Pas de support disponible Pas de résumé disponible

Appel à contribution 6

None

Organisation

  • Mickaël Dardaillon (INSA Rennes / IETR)
  • Nicolas Gac (L2S - Université Paris Saclay)
  • Matthieu Haefele (CNRS/UPPA)
  • Shan Mignot (Laboratoire Lagrange)
  • Charles Prouveur (CEA)
  • Antsa Randriamanantena (CNRS/LAB)
  • Bogdan Vulpescu (CNRS/IN2P3/UCA)