top of page

Open Data

Particle Flow Reconstruction

Scalable Neural Network Models and Terascale Datasets

One of the main approaches for event reconstruction at the Large Hadron Collider (LHC) currently relies on particle flow (PF), which combines hits across subdetectors, considering the full event to reconstruct all stable particles in the event. Given the planned High-Luminosity (HL) LHC program, as well as possible future experimental programs of e.g., the Future Circular Collider (FCC), computationally efficient and physically optimal evolutions of the PF-based event reconstruction need to be developed and tested.

Among various approaches, there has been considerable interest and development of Machine Learning (ML)-based reconstruction methods, including for full-event reconstruction. To support rapid progress of such approaches, it is beneficial to establish open datasets with sufficient realism and granularity for testing various types of approaches.

In light of this, we describe, and make available, an extensive open dataset of physics events with full GEANT4 simulation, suitable for PF reconstruction, available in the EDM4HEP [1]  format.

PFR.gif

Figure 1: 3D visualization of the generator particles (targets) and the calorimeter hits in a single event.

​We generate dedicated events with Pythia8 [2] and carry out a full detector simulation with GEANT4 using the Key4HEP framework [3]. In particular, we use the CLIC detector model [4], along with the Marlin reconstruction code [5], and the Pandora [6,7,8] package for a baseline particle flow implementation. Although the implementation is not specific to the detector model, the CLIC model is chosen since, to our knowledge, it is one of the most complete publicly available realistic detector models.    

Description of files and download

The datasets with all generator particles (training targets); reconstructed tracks, calorimeter hits and clusters (training inputs); as well as reconstructed particles from the baseline Pandora algorithm (for comparison) are saved in the EDM4HEP format. In addition, all associations between the aforementioned objects are saved in the standard format. Overall, the size of the dataset is approximately 2.5 TB.    

 

This dataset is being used in studies of the Machine-Learned Particle-Flow (MLPF) algorithm [9,10,11] and new results are being prepared for publication in the near future. Any works using this dataset should cite the corresponding paper, once published.

The dataset consists of physical collision events as well as particle gun samples and is packaged in 43 tar archives with the naming convention <process_name>_<number>.tar for the physical samples and <process_name>.tar for the gun samples, where <process_name> refers to the name of the physics process and <number> is a running integer. Each tar archive contains ROOT [12] files where the physics events are saved in the EDM4HEP format. To process the data for ML tasks, the Python package uproot [13], which allows for convenient data loading of ROOT files into Python and NumPy objects, is recommended.

The datasets were generated as part of the project "Flexible and scalable data reconstruction and analysis using machine learning", grant PSG864 of the Estonian Research Council, using the KBFI computing cluster.

Dataset download:

Download

single electron particle gun

71 GB

e+.tar

Download

single electron particle gun

71 GB

e-.tar

Download

single photon particle gun

65 GB

gamma.tar

Download

single K-Long particle gun

43 GB

kaon0L.tar

Download

single muon particle gun

9.2 GB

mu+.tar

Download

single muon particle gun

9.2 GB

mu-.tar

Download

single neutron particle gun

72 GB

neutron.tar

Download

single neutral pion

72 GB

pi0.tar

Download

single charged pion

44 GB

pi+.tar

Download

single charged pion

45 GB

pi-.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_1.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_2.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_3.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_4.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_5.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

0.5 GB

p8_ee_qq_ecm380_6.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_7.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_8.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_9.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_10.tar

Download

pythia8, electron-positron going to quarks at a center of mass energy of 380 GeV

104 GB

p8_ee_qq_ecm380_11.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV

147 GB

p8_ee_tt_ecm380_1.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV

147 GB

p8_ee_tt_ecm380_2.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV

147 GB

p8_ee_tt_ecm380_3.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV

147 GB

p8_ee_tt_ecm380_4.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV

147 GB

p8_ee_tt_ecm380_5.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV

0.8 GB

p8_ee_tt_ecm380_6.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV with a pile-up of 10

257 GB

p8_ee_tt_ecm380_PU10_1.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV with a pile-up of 10

257 GB

p8_ee_tt_ecm380_PU10_2.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV with a pile-up of 10

132 GB

p8_ee_tt_ecm380_PU10_3.tar

Download

pythia8, electron-positron going to a top quark pair at a center of mass energy of 380 GeV with a pile-up of 10

257 GB

p8_ee_tt_ecm380_PU10_4.tar

Download

electron positron going to two W bosons, decaying fully hadronically, at a center of mass energy of 380 GeV

142 GB

p8_ee_WW_fullhad_ecm380_1.tar

Download

electron positron going to two W bosons, decaying fully hadronically, at a center of mass energy of 380 GeV

142 GB

p8_ee_WW_fullhad_ecm380_2.tar

Download

electron positron going to two W bosons, decaying fully hadronically, at a center of mass energy of 380 GeV

142 GB

p8_ee_WW_fullhad_ecm380_3.tar

Download

electron positron going to two W bosons, decaying fully hadronically, at a center of mass energy of 380 GeV

142 GB

p8_ee_WW_fullhad_ecm380_4.tar

Download

electron positron going to two W bosons, decaying fully hadronically, at a center of mass energy of 380 GeV

142 GB

p8_ee_WW_fullhad_ecm380_5.tar

Download

electron positron going to two W bosons, decaying fully hadronically, at a center of mass energy of 380 GeV

0.7 GB

p8_ee_WW_fullhad_ecm380_6.tar

Download

electron positron going to a Z boson and a Higgs, Higgs decaying to tau leptons, at a center of mass energy of 380 GeV

78 GB

p8_ee_ZH_Htautau_ecm380_1.tar

Download

electron positron going to a Z boson and a Higgs, Higgs decaying to tau leptons, at a center of mass energy of 380 GeV

78 GB

p8_ee_ZH_Htautau_ecm380_2.tar

Download

electron positron going to a Z boson and a Higgs, Higgs decaying to tau leptons, at a center of mass energy of 380 GeV

78 GB

p8_ee_ZH_Htautau_ecm380_3.tar

Download

electron positron going to a Z boson and a Higgs, Higgs decaying to tau leptons, at a center of mass energy of 380 GeV

78 GB

p8_ee_ZH_Htautau_ecm380_4.tar

Download

electron positron going to a Z boson and a Higgs, Higgs decaying to tau leptons, at a center of mass energy of 380 GeV

78 GB

p8_ee_ZH_Htautau_ecm380_5.tar

Download

electron positron going to a Z boson and a Higgs, Higgs decaying to tau leptons, at a center of mass energy of 380 GeV

0.4 GB

p8_ee_ZH_Htautau_ecm380_6.tar

References

[1]  Gaede, F., Ganis, G., Hegner, B., Helsens, C., Madlener, T., Sailer, A., Stewart, G. A., Volkl, V., & Wang, J. (2021). EDM4hep

      and podio - The event data model of the Key4hep project and its implementation. EPJ Web of Conferences, 251, 03026.

      https://doi.org/10.1051/epjconf/202125103026

[2]  Bierlich, C., Chakraborty, S., Desai, N., Gellersen, L., Helenius, I., Ilten, P., Lönnblad, L., Mrenna, S., Prestel, S., Preuss, C. T.,

       Sjöstrand, T., Skands, P., Utheim, M., & Verheyen, R. (2022). A comprehensive guide to the physics and usage of PYTHIA

       8.3. SciPost Physics Codebases, 8. https://doi.org/10.21468/SciPostPhysCodeb.8

[3]  Ganis, G., Helsens, C., & Völkl, V. (2022). Key4hep, a framework for future HEP experiments and its use in FCC. The

       European Physical Journal Plus, 137(1), 149. https://doi.org/10.1140/epjp/s13360-021-02213-1

[4]  CLIC Collaboration. CLICdet: The post-CDR CLIC detector model. CLICdp note. 2017    

[5]  Gaede, F. (2006). Marlin and LCCD—Software tools for the ILC. Nuclear Instruments and Methods in Physics Research

       Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 559(1), 177–180.

       https://doi.org/10.1016/j.nima.2005.11.138

[6]  Marshall, J. S., & Thomson, M. A. (2012). The Pandora Software Development Kit for Particle Flow Calorimetry. Journal of

       Physics: Conference Series, 396(2), 022034. https://doi.org/10.1088/1742-6596/396/2/022034

[7]  Marshall, J. S., Münnich, A., & Thomson, M. A. (2013). Performance of particle flow calorimetry at CLIC. Nuclear Instruments

      and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 700, 153–

      162. https://doi.org/10.1016/j.nima.2012.10.03

[8]  Marshall, J. S., & Thomson, M. A. (2015). The Pandora software development kit for pattern recognition. The European

       Physical Journal C, 75(9), 439. https://doi.org/10.1140/epjc/s10052-015-3659-3

[9]  Pata, J., Duarte, J., Vlimant, J.-R., Pierini, M., & Spiropulu, M. (2021). MLPF: efficient machine-learned particle-flow

      reconstruction using graph neural networks. The European Physical Journal C, 81(5), 381.

      https://doi.org/10.1140/epjc/s10052-021-09158-w

[10] Pata, J., Duarte, J., Mokhtar, F., Wulff, E., Yoo, J., Vlimant, J.-R., Pierini, M., & Girone, M. (2023). Machine Learning for

       Particle Flow Reconstruction at CMS. Journal of Physics: Conference Series, 2438(1), 012100.

       https://doi.org/10.1088/1742-6596/2438/1/012100

[11] Wulff, E., Girone, M., & Pata, J. (2023). Hyperparameter optimization of data-driven AI models on HPC systems. Journal of

      Physics: Conference Series, 2438(1), 012092. https://doi.org/10.1088/1742-6596/2438/1/012092

[12] Brun, R., & Rademakers, F. (1997). ROOT — An object oriented data analysis framework. Nuclear Instruments and Methods

      in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 389(1–2), 81–86.

      https://doi.org/10.1016/S0168-9002(97)00048-X

[13] Pivarski, J., Das, P., Burr, C., Smirnov, D., Feickert, M., Gal, T., Kreczko, L., Smith, N., Biederbeck, N., Shadura, O., Proffitt, M.,

      Krikler, B., Dembinski, H., Schreiner, H., et al. (2021). scikit-hep/uproot3: 3.14.4 (3.14.4). Zenodo.

      https://doi.org/10.5281/zenodo.4537826

bottom of page