Methodologies for the event reconstruction and classification at the CERN HL-LHC use-case
The Compact Muon Solenoid (CMS) is a general-purpose detector at the Large Hadron Collider (LHC) at CERN. It has a broad physics programme ranging from studying the Standard Model (including the Higgs boson) to searching for extra dimensions and particles that could make up dark matter. The detector and data acquisition systems are being upgraded in preparation for the High Luminosity LHC (HL-LHC). At the HL-LHC each experiment will produce exabytes yearly, which will result in an unprecedented computing challenge.
The enormous amounts of collision data recorded need to be processed and analyzed as efficiently as possible. Optimization of the throughput is of utmost importance to CMS as it allows to better utilize vast compute resources and to probe new physics in a shorter amount of time. The overall collection of software used by the CMS experiment for data processing, referred to as CMSSW, is built around a framework, an Event Data Model (EDM), and services needed by the simulation, calibration, alignment and reconstruction modules that process event data for analysis. The primary goal of the Framework and EDM is to facilitate the development and deployment of reconstruction and analysis software. CMSSW employs Intel Thread Building Blocks (TBB), which allows to create very flexible and fine-grained concurrent algorithms to efficiently handle multiple events being reconstructed on the fly. Several sub-systems (sub-detectors) of the CMS detector have been ported to CUDA (GPGPUs) to achieve even higher throughput and better applicability on HPC systems. During the reconstruction of every LHC collision, hundreds of different algorithms run concurrently, where some are traditional algorithms optimized for particular hardware configurations, others already include AI-driven methods, e.g., deep NNs (DNNs) for particle identification. Currently, datasets range from TBs to PBs in size and these numbers will only increase more and more with new physics ranges probed. There are three main challenges to be tackled within RAISE:
Employing heterogeneous computing platforms. Today, only a fraction of algorithms is ported to utilize GPGPUs. By accelerating this effort, higher throughput is achieved, resulting in better physics reach during HL-LHC and more efficient utilization of resources when employing HPC facilities. Furthermore, other platforms such as FPGAs will require thorough investigation.
Moving towards fully AI/ML-based reconstruction. Currently, only few selected algorithms typically delivering classification capabilities are replaced by AI/ML. With new types of detectors to come in HL-LHC, applying more sophisticated approaches for regression such as energy regression for calorimeters using Graph NNs instead of traditional algorithms helps overcoming the problem of delivering more precise results and the computational challenges involved using traditional algorithms.
Modular Data Delivery and Ingestion. CMSSW is at present a monolithic system, i.e., every node of the HPC system runs the same executable with the same algorithms, except on different input data. This is efficient when targeting homogeneous systems but proves to be ineffective when moving towards MSA. Therefore, it is important for CMSSW to adapt in order to be used efficiently on MSA.
Figure 1: CMS Experiment at the LHC, CERN
We are looking forward to tacking these challenges in collaboration with the other RAISE partners
Esplanade des particules 1211 Geneva 23, Switzerland
e-mail: maria.girone [@] cern.ch