Unique AI Framework

UAIF Code Repositories

Machine-Learned Particle-Flow (MLPF)

Machine-Learned Particle-Flow (MLPF) is an algorithm based on Graph Neural Networks (GNN) and is aimed at performing efficient, GPU-accelerated particle flow reconstruction at large particle detector experiments. It takes particle tracks and calorimeter clusters as input and gives higher-level physics objects, for instance electrons, hadrons and photons, as output. This repository contains the code necessary to train MLPF using single or multiple GPUs, to perform large-scale hyperparameter optimization (HPO) using multiple compute nodes on HPC systems, to evaluate the model performance as well as to export the model for later use in inference. The main model and training is implemented in TensorFlow while Ray Tune is used for HPO. A publicly available dataset is available at [1]. MLPF was first introduced in [2] and later versions appeared in [3,4].

GIT Repository

AI for HPC

AI4HPC, part of CoE RAISE, is an open-source library to train AI models with CFD datasets on HPC systems. In CoE RAISE, innovative AI methods on heterogeneous HPC architectures capable of scaling towards Exascale are developed and generalized for selected representative simulation codes and data-driven workflows. AI4HPC consists of data manipulation routines tuned to handle CFD datasets, ML models useful for CFD analyses, and optimizations for HPC systems. AI4HPC also includes a benchmarking suite to test the limits of any system with CPUs and GPUs towards Exascale and a HyperParameter Optimization (HPO) suite for scalable HPO tasks.

GIT Repository

AI4Sim Model Collection

This repository proposes convolutional (CNN) and graph-based neural network (GNN) architectures as physical surrogates in various industrial use cases. While CNNs can reach state-of-the-art performances on structured grids, GNNs offer natural surrogates for simulations relying on complex unstructured meshes. Several use cases are presented, with comparisons of training pipelines based on CNNs and on GNNs.

GIT Repository

PhyDLL

GIT Repository

PhyDLL (Physics Deep Learning coupLer) is an open-source coupling library (https://phydll.readthedocs.io). It allows a performant data transfer and processing between massively parallel physical solvers and distributed deep learning inferences. PhyDLL proposes different coupling schemes that suit the context and the data-structure topology. Currently, Fortran and Python interfaces are available for physical solvers and deep learning engines respectively. The ongoing collaborations within CoE RAISE, will enable the creation of a C/C++ interface in order to making PhyDLL even more accessible for a wider range of users. Toward exascale, the development of inter-GPU communications in multi-nodes settings within PhyDLL is under way.

Earth Observation Data Workflows with Apache Airflow

This repository presents a scalable and parallelizable workflow using Apache Airflow, capable of integrating Machine Learning (ML) and Deep Learning (DL) models with Modular Supercomputing Architecture (MSA) systems [6]. To test the workflow, we considered the production of large-scale Land-Cover (LC) maps as a case study. It can generate LC maps based on Sentinel-2 data using ML and DL algorithms combined with High-Performance Computing (HPC) technology.

The workflow manager, Airflow, offers scalability, extensibility, and programmable task definition in Python. It allows us to execute different steps of the workflow in different HPC systems, leading to efficient utilization of available computing resources on the HPC machine. The workflow is demonstrated on the Dynamical Exascale Entry Platform (DEEP) and Jülich Research on Exascale Cluster Architectures (JURECA) hosted at the Jülich Supercomputing Centre (JSC), a platform that incorporates heterogeneous JSC systems.

GIT Repository

References

[1] J. Pata, J. Duarte, J.R. Vlimant, M. Pierini, M. Spiropulu "Simulated particle-level events of ttbar and QCD with PU200 using

Pythia8+Delphes3 for machine learned particle flow (MLPF)” https://zenodo.org/record/4559324#.Y1JeOi8Rpqs

[2] J. Pata, J. Duarte, J.-R. Vlimant, M. Pierini, and M. Spiropulu, “MLPF: Efficient machine-learned particle-flow reconstruction using graph

neural networks,” Eur. Phys. J. C, vol. 81, no. 5, p. 381, 2021. https://doi.org/10.1140/epjc/s10052-021-09158-w

[3] J. Pata et al. “Machine Learning for Particle Flow Reconstruction at CMS” https://arxiv.org/abs/2203.00330

[4] E. Wulff et al. “Hyperparameter optimization of data-driven AI models on HPC systems" https://arxiv.org/abs/2203.01112

[5] Baldi, P. (2012). Autoencoders, Unsupervised Learning, and Deep Architectures. Proceedings of ICML Workshop on Unsupervised and Transfer Learning, Proceedings of Machine Learning Research, 27, 37–49. http://proceedings.mlr.press/v27/baldi12a/baldi12a.pdf

[6] L. Tian, R. Sedona, A. Mozaffari, E. Kreshpa, C. Paris, M. Riedel, M. G. Schultz, G. Cavallaro, "End-to-End Process Orchestration of Earth Observation Data Workflows with Apache Airflow on High Performance Computing, " in Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2023 (press)