Unique AI Framework

- (UAIF) -

CoE RAISE follows the rules of open science and publishes its results open-access when they are ready for wider application. All developments of CoE RAISE are being integrated into the Unique AI Framework (UAIF), which will not only contain the trained models but also documentation on how to use them on current Petaflop and future Exascale HPC, prototype, and disruptive systems. The developments toward the Unique AI Framework are continuously progressing. The present code base of the UAIF can be found below.

Machine-Learned Particle-Flow (MLPF)

ev1_mlpf_3d.jpg

Machine-Learned Particle-Flow (MLPF) is an algorithm based on Graph Neural Networks (GNN) and is aimed at performing efficient, GPU-accelerated particle flow reconstruction at large particle detector experiments. It takes particle tracks and calorimeter clusters as input and gives higher-level physics objects, for instance electrons, hadrons and photons, as output. This repository contains the code necessary to train MLPF using single or multiple GPUs, to perform large-scale hyperparameter optimization (HPO) using multiple compute nodes on HPC systems, to evaluate the model performance as well as to export the model for later use in inference. The main model and training is implemented in TensorFlow while Ray Tune is used for HPO. A publicly available dataset is available at [1]. MLPF was first introduced in [2] and later versions appeared in [3,4].

AI for HPC

DDT.png

In CoE RAISE, innovative AI methods on heterogeneous HPC architectures capable of scaling towards Exascale are developed and generalized for selected representative use cases. This repository contains scripts for training Convolutional Autoencoders (CAEs) [5] on actuated Turbulent Boundary Layer (TBL) simulation datasets. CAEs are Neural Network (NN) architectures that map the input to a lower dimensional domain, and then map it back to its original input. This way, huge datasets can be compressed into much smaller sizes without any compromises. For good accuracy, the training requires a large dataset, thus, training runtimes are lengthy. Hence, data-distributed training on HPC systems comes to aid! The repository that contains a complete guide to running CAE training on actuated TBL dataset using PyTorch with a data-distributed training package (PyTorch-DDP). It furthermore includes examples of running the code on the JUWELS Booster system, which is one of the fastest HPC systems in the world. 

References

[1]  J. Pata, J. Duarte, J.R. Vlimant, M. Pierini, M. Spiropulu "Simulated particle-level events of ttbar and QCD with PU200 using    

     Pythia8+Delphes3 for machine learned particle flow (MLPF)” https://zenodo.org/record/4559324#.Y1JeOi8Rpqs

[2] J. Pata, J. Duarte, J.-R. Vlimant, M. Pierini, and M. Spiropulu, “MLPF: Efficient machine-learned particle-flow reconstruction using graph

      neural networks,” Eur. Phys. J. C, vol. 81, no. 5, p. 381, 2021. https://doi.org/10.1140/epjc/s10052-021-09158-w

[3] J. Pata et al. “Machine Learning for Particle Flow Reconstruction at CMS” https://arxiv.org/abs/2203.00330 

[4] E. Wulff et al. “Hyperparameter optimization of data-driven AI models on HPC systems" https://arxiv.org/abs/2203.01112 

[5] Baldi, P. (2012). Autoencoders, Unsupervised Learning, and Deep Architectures. Proceedings of ICML Workshop on Unsupervised and Transfer Learning, Proceedings of Machine Learning Research, 27, 37–49. http://proceedings.mlr.press/v27/baldi12a/baldi12a.pdf