CMS DAS 2025 Generators Short Exercise

1 - Matrix Element Generator

Overview

Teaching: 20 min
Exercises: 40 min
Questions
  • What are Monte Carlo Generators?

  • Why are we using simulated samples in CMS?

  • How are simulated samples created in CMS?

Objectives
  • Use the MadGraph generator in standalone mode and get familiar with the basic syntax

  • Analyze the produced LHE files

Introduction and first steps

Although quite old, link is a great reading material to get a general overview of Monte Carlo event generators. Monte Carlo event generators are essential components of almost all experimental analyses and are also widely used by theorists and experiments to make predictions and preparations for future experiments. It is one of the topics where we CMS experimentalists and theorists have the closest connections to, theorists give us predictions and experimentalists verify them with the actual data. Although Monte Carlo event generators are extremely important tools in HEP, they are often used as black boxes which we more or less treat them as “data”. Our aim is to get the minimal background of how these tools are working and analyze them using the generator level information.

Samples that are used by CMS experiments go through several steps of simulation :

  1. Monte Carlo event generator
  2. Detector simulation
  3. Pileup mixing
  4. Trigger emulation
  5. Object econstruction

We focus on “1. Monte Carlo event generator” in this tutorial. Monte Carlo event generator can be further divided into several subpieces as each steps can be factorized and can be handled through separate calculations :

  1. Parton distribution function (PDF)
  2. Hard scattering (matrix element calculation)
  3. Parton shower & hadronization First of all, LHC is a proton-proton collider, hence we need information on how partons (quarks and gluons) are distributed in the proton (PDF). Hard scattering is the part where calculations can be treated perturbatively, interactions of incoming partons with the largest momentum transfer (usually the physics process we are interested in). Parton shower & hadronization further describes how the particles involed in the hard scattering evolve, working downwards to lower momentum scales even to a point where perturbative calculations break down.

Using Standalone Madgraph

In the first part of the exercise, we will use the matrix element generator MadGraph5 _aMC@NLO, or in short MadGraph link. MadGraph can perform the calculations for many different physics processes (both SM and BSM) at leading and next-to-leading order (LO & NLO) in QCD. Because of its easy user interface and flexibility with UFO models, you can test wide variety of physics modeling. We will now first see how MadGraph runs interactively in standalone mode using simple W+ (wplus) process as an example.

We will first use the interactive prompt of MadGraph to generate proton proton collision events that produce W bosons. First, log in to a new session on the LPC cluster (ssh -Y <USERNAME>@cmslpc-el8.fnal.gov). Make sure you have completed the setup steps! Then, start the interactive prompt of Madgraph:

cd ~/nobackup/cmsdas_2025_gen/MG5_aMC_v3_5_2/
./bin/mg5_aMC

Madgraph is configured and steered through text-based cards. The process definitions can be stored in a card called proc_card.dat. You can look at an example using the following command:

!cat wplustest_4f_LO_proc_card.dat

Note: the exclamation mark is used to execute shell commands within Madgraph, e.g. !cat in the above example.

import model sm-ckm
#switch to diagonal ckm matrix if relevant for speed
#import model sm-lepton_masses

define ell+ = e+ mu+ ta+
define ell- = e- mu- ta-

generate p p > w+, w+ > ell+ vl @0

output wplustest_4f_LO -nojpeg

Copy/paste the commands line-by-line and pay attention to the output.

The two most important lines of this block are the model import (import model sm-ckm) and the instructions on the process to generate (generate p p > w+, w+ > ell+ vl). Within the MG directory you can find a directory models, that contains different pre-installed models. The most obvious one that we are using in the example is sm - the standard model at leading order in perturbative QCD. Model parameters can be configured through ‘‘restriction cards’’, in this example restrict_ckm.dat loaded through the syntax sm-ckm. This specific restriction card uses a non-diagonal CKM matrix (diagonal CKM is the default otherwise for simplification and faster running). One great feature of MadGraph is it’s flexibility in terms of physics models to use. To generate a sample using a new physics model one can use the UFO interface. A database of models can be found in the feynrules model database.

The practically most relevant part is that MG figures out all relevant Feynman diagrams contributing to a process. If you are trying to set up a new MC sample, looking at these Feynman diagrams is a great way to check that you actually get the physics you want. To check them out you can open the individual plots in e.g. wplustest_4f_LO/SubProcesses/P0_qq_wp_wp_lvl/matrix*.ps with gv, display or evince. You can also use the ps2pdf program to convert the post script files into PDFs.

Alternatively, remove -nojpeg from the output line and look at the diagrams in jpeg format using display.

Now that Madgraph has figured out the feynman diagrams you can start the actual computation within the MG5 prompt with

launch

Hint: if you closed the interactive MG session for some reason you can still launch without rerunning the previous commands with

./bin/mg5_aMC
launch wplustest_4f_LO

Madgraph will ask you a few more questions. Press tab to turn off the timer (otherwise, MadGraph will move on by itself after 60 seconds).

/===========================================================================\
| 1. Choose the shower/hadronization program     shower = Not Avail.        |
| 2. Choose the detector simulation program    detector = Not Avail.        |
| 3. Choose an analysis package (plot/convert) analysis = Not Avail.        |
| 4. Decay onshell particles                    madspin = OFF               |
| 5. Add weights to events for new hypp.       reweight = Not Avail.        |
\===========================================================================/

The first one you can just skip by pressing <RETURN>. As we did not install any other shower, detector, analysis package, they are in Not Avail. state.

Do you want to edit a card (press enter to bypass editing)?
/------------------------------------------------------------\
|  1. param : param_card.dat                                 |
|  2. run   : run_card.dat                                   |
\------------------------------------------------------------/
 you can also
   - enter the path to a valid card or banner.
   - use the 'set' command to modify a parameter directly.
     The set option works only for param_card and run_card.
     Type 'help set' for more information on this command.
   - call an external program (ASperGE/MadWidth/...).
     Type 'help' for the list of available command
 [0, done, 1, param, 2, run, enter path][90s to answer] 

Let’s take a look at the param card and see how the values are set, press 1 and ENTER (<RETURN>) to investigate the parameter settings.

###################################
## INFORMATION FOR MASS
###################################
Block mass
    5 4.700000e+00 # MB 
    6 1.730000e+02 # MT 
   15 1.777000e+00 # MTA 
   23 9.118800e+01 # MZ 
   25 1.250000e+02 # MH

...

###################################
## INFORMATION FOR DECAY
###################################
DECAY   6 1.491500e+00 # WT 
DECAY  23 2.441404e+00 # WZ 
DECAY  24 2.047600e+00 # WW 
DECAY  25 6.382339e-03 # WH 

Let’s take a look at the run card and see how the values are set, press 2 and ENTER (<RETURN>) to investigate the run settings.

#*********************************************************************
# Number of events and rnd seed                                      *
# Warning: Do not generate more than 1M events in a single run       *
#*********************************************************************
  10000 = nevents ! Number of unweighted events requested
  0   = iseed   ! rnd seed (0=assigned automatically=default))

...

#*********************************************************************
# Collider type and energy                                           *
# lpp: 0=No PDF, 1=proton, -1=antiproton,                            *
#                2=elastic photon of proton/ion beam                 *
#             +/-3=PDF of electron/positron beam                     *
#             +/-4=PDF of muon/antimuon beam                         *
#*********************************************************************
     1        = lpp1    ! beam 1 type
     1        = lpp2    ! beam 2 type
     6500.0     = ebeam1  ! beam 1 total energy in GeV
     6500.0     = ebeam2  ! beam 2 total energy in GeV

...

#*********************************************************************
# Standard Cuts                                                      *
#*********************************************************************
# Minimum and maximum pt's (for max, -1 means no cut)                *
#*********************************************************************
 10.0  = ptl       ! minimum pt for the charged leptons
 -1.0  = ptlmax    ! maximum pt for the charged leptons
 {} = pt_min_pdg ! pt cut for other particles (use pdg code). Applied on particle and anti-particle
 {}     = pt_max_pdg ! pt cut for other particles (syntax e.g. {6: 100, 25: 50})

...

#*********************************************************************
# Minimum and maximum invariant mass for pairs                       *
#*********************************************************************
 0.0   = mmll    ! min invariant mass of l+l- (same flavour) lepton pair
 -1.0  = mmllmax ! max invariant mass of l+l- (same flavour) lepton pair
 {} = mxx_min_pdg ! min invariant mass of a pair of particles X/X~ (e.g. {6:250})
 {'default': False} = mxx_only_part_antipart ! if True the invariant mass is applied only
                       ! to pairs of particle/antiparticle and not to pairs of the same pdg codes.

...

#*********************************************************************
# maximal pdg code for quark to be considered as a light jet         *
# (otherwise b cuts are applied)                                     *
#*********************************************************************
 4 = maxjetflavor    ! Maximum jet pdg code

Try editting the beam energy (ebeam1 and ebeam2) 6500 to 6800 as we are now running at 13.6TeV beam energy. When done with editting, escape after saving the changes in the text file.

MadGraph allows you to change settings by interactively typing in below as well.

set run_card nevents 5000

Take a look at the run card again and see if number of events to generate (nevents) is changed to 5000. And change it back to 10000 using same command and check again.

As shown above, there are several phase space cuts set by default (e.g. 10.0 = ptl). There is a handy command that removes all phase space cuts at once (instead of doing set run_card ptl 0, set run_card ptj 0, … one by one by hand).

set no_parton_cut

Take a look at the card again and see if lepton pt cut (ptl) is changed to 0. Keep in mind that the cuts you give before doing set no_parton_cut will be removed by this command. So don’t forget to do set no_parton_cut before giving the cuts you wish to give.

Once you are done, please provide the path to the pre-made run_card: wplustest_4f_LO_run_card.dat

What is the cross section determined by Madgraph?

Obtaining the cross section

Define a process (e.g. from the process card above) and launch.

Solution

launch wplustest_4f_LO
=== Results Summary for run: run_01 tag: tag_1 ===

     Cross-section :   2.715e+04 +- 39.45 pb
     Nb of events :  10000

INFO: No version of lhapdf. Can not run systematics computation
store_events
INFO: Storing parton level results
INFO: End Parton
reweight -from_cards
decay_events -from_cards
INFO: storing files of previous run
INFO: Done

The cross section calculated by MG is 2.715e+04 +- 39.45 pb.

The main output that MG produces is called ‘‘LHE file’’. The LHE file (Les Houches Event file) is a standard file format that stores process and event information from parton-level event generators. The documentation can be found here.

In general, the LHE file contains a header with description of the settings of the generator (e.g. process and run information), and multiple event blocks (one for each event). The LHE file is plain text, so it’s usually a good idea to use some compression algorithm to save space - MG zips the output by default.

Looking at the LHE output

Find the LHE file produced by MG and find the first event block.

Example solution

Exit MG, then do

find -path './wplustest_4f_LO/*.lhe.gz'
gzip -d ./wplustest_4f_LO/Events/run_01/unweighted_events.lhe.gz
less ./wplustest_4f_LO/Events/run_01/unweighted_events.lhe
<LesHouchesEvents version="3.0">
<header>
...
</header>
...
<event>
 5      0 +2.7145900e+04 7.93095700e+01 7.54677100e-03 1.33102200e-01
        2 -1    0    0  501    0 +0.0000000000e+00 +0.0000000000e+00 +4.5829549845e+01 4.5829549845e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
       -1 -1    0    0    0  501 -0.0000000000e+00 -0.0000000000e+00 -3.4311969734e+01 3.4311969734e+01 0.0000000000e+00 0.0000e+00 1.0000e+00
       24  2    1    2    0    0 +0.0000000000e+00 +0.0000000000e+00 +1.1517580112e+01 8.0141519579e+01 7.9309573878e+01 0.0000e+00 0.0000e+00
      -13  1    3    3    0    0 -1.6845086581e+01 +2.2368564620e+01 -2.2614075432e+01 3.5993138689e+01 0.0000000000e+00 0.0000e+00 1.0000e+00
       14  1    3    3    0    0 +1.6845086581e+01 -2.2368564620e+01 +3.4131655543e+01 4.4148380890e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
<mgrwt>
<rscale>  0 0.79309574E+02</rscale>
<asrwt>0</asrwt>
<pdfrwt beam="1">  1        2 0.70507000E-02 0.79309574E+02</pdfrwt>
<pdfrwt beam="2">  1       -1 0.52787646E-02 0.79309574E+02</pdfrwt>
<totfact> 0.15019241E+05</totfact>
</mgrwt>
<rwgt>
<wgt id='1'> +2.3707085e+04 </wgt>
...
</rwgt>
</event>

What does each column mean?

Solution

ID, status, mother1, mother2, color, anticolor, px, py, pz, E, mass, life time, and spin

-11  1    3    3    0    0 -2.3393803385e+01 -7.4187481776e+00 -1.5274153214e+02 1.5470062541e+02 0.0000000000e+00 0.0000e+00 1.0000e+00

This line tells you that a positron (ID) is an outgoing particle (status) with Z as its mother (mother1 and mother2 : 3rd particle is Z which is ID=23) with no color (color and anticolor), …

MadGraph syntax

If you want to add another process, e.g. production of W- in the above example, you can add another process with add process p p > w-, w- > ell- vl~

A detailed introduction to the syntax is given in this documentation. Some very basic things to keep in mind:

generate p p > e+ e- will generate any diagram that is compatible with the used model that produces an electron / positron pair

generate p p > e+ e- / Z will exclude diagrams that contain a Z boson as internal paricle

generate p p > e+ e- $ Z will exclude the Z boson from appearing in the s-channel (careful about gauge invariance)

generate p p > Z > e+ e- will always include a Z boson in the s-channel (careful about gauge invariance)

generate p p > w+ QED=3 will include all QED contributions, otherwise the QED order is always set to its minimal value

Bonus: Obtaining the cross section of W boson production

The exercise above only contained W+ bosons (only positive charge). Add production of the negatively charged W bosons and calculate the cross section. Before running MG, think about what result you would expect, i.e. by how much do you think the cross section should increase. Compare the results of W+ and W+/-. What do you conclude?

Solution

import model sm-ckm

define ell+ = e+ mu+ ta+
define ell- = e- mu- ta-
generate p p > w+, w+ > ell+ vl @0
add process p p > w-, w- > ell- vl~ @1

output wtest_4f_LO -nojpeg
launch 

When prompted about the run_card, use wplustest_4f_LO_run_card.dat again.

  === Results Summary for run: run_01 tag: tag_1 ===

     Cross-section :   4.667e+04 +- 63.91 pb
     Nb of events :  10000

The cross section calculated by Madgraph is 4.732e+04 +- 57.08 pb. While one would naiively expect the cross section to double by including W- bosons we only get a cross section that is ~40% larger. The simplified explanation is that the initial state protons contain more up valence quarks than down valence quarks.

Using the gridpack workflow

As mentioned previously, interactive running of MG5 is useful for developments and quick tests, but ultimately not practical for large-scale production. To avoid having to use the interactive mode, one can make use the card structure of MG. A fully automated workflow for running MG and producing gridpacks is maintained in the genproductions repository. A gridpack is simply an archive file that contains all the executable MG5 code needed to produce LHE events for a given process, which can then be executed easily on many different grid workers (hence the name). It has the advantage that once it is created, it is a one-button program to generate events, no thinking required. In this part of the exercise, we will use the same input cards as before to create a gridpack, run it, and compare the results to before.

Gridpacks are generated using the gridpack_generation script, which we will run in local mode, i.e. on the machine we are currently logged in to. Note that scripts are provided to run the gridpacks on other computing infrastructures such as the CERN batch system and CMSConnect, which is useful for more complicated processes.

To create a gridpack, we simply call gridpack_generation.sh and pass the process name and card location to it.

Setting and unsetting the CMSSW environment

You should not have activated a CMSSW environment in this exercise so far. However, if you did so before, you need to unset it in order to not interfere with the genproductions script. You can run the following command to unset the CMSSW environment, or log in to a clean new session.

eval `scram unsetenv -sh`

We will be generating a gridpack with cards similar to the commands we’ve used in the standalone example above. The cards are located in the MG section of the genproductions directory

cd ~/nobackup/cmsdas_2025_gen/genproductions_mg352/bin/MadGraph5_aMCatNLO
time ./gridpack_generation.sh wplustest_4f_LO cards/examples/wplustest_4f_LO local

Naming conventions

There are certain naming conventions for the input cards. For a given process name $NAME, the input cards must be named as $NAME_run_card.dat, $NAME_proc_card.dat, etc…

mkdir work
cd work
tar xf ../wplustest_4f_LO_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz

NEVENTS=10000
RANDOMSEED=12345
NCPU=1
./runcmsgrid.sh $NEVENTS $RANDOMSEED $NCPU

This will produce an output LHE file called cmsgrid_final.lhe.

Comparing the LHE output

There are multiple ways of analyzing an LHE file, each of which has its own advantages and disadvantages. For the purpose of this exercise, we will use a pre-made pyroot script.

cd ~/nobackup/cmsdas_2025_gen/
cp /eos/uscms/store/user/cmsdas/2025/short_exercises/generators/LHEReader.py .

Convert LHE output to root format

Make sure the Python virtual environment is deactivated

If you are still using the virtual environment, you need to unset it in order to not interfere with the CMSSW environment.

cd CMSSW_12_4_8/src; cmsenv; cd -
python3.9 LHEReader.py --input MG5_aMC_v3_5_2/wplustest_4f_LO/Events/run_01/unweighted_events.lhe --output standalone.root
python3.9 LHEReader.py --input genproductions_mg352/bin/MadGraph5_aMCatNLO/work/cmsgrid_final.lhe --output cmsgrid.root

Feel free to experiment here and plot various quantities. What are the shapes of the lepton pT distributions? What is the shape of the pT distribution of the W system? Are these shapes physical?

Key Points

  • MadGraph is a widely used tool to generate matrix-element predictions for the hard scatter for SM and BSM processes.

  • Standalone MadGraph can run interactively on-the-fly or by importing the predefined text scripts

  • Gridpacks are used for large scale productions with consistency guaranteed

  • LHE level information is not physical and parton shower is needed to describe full physics


2 - Parton Shower Generator

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • Why do we need to do parton showering?

  • How are simulated samples created in CMS?

Objectives
  • Perform parton shower with LHE file as an input

  • Perform parton shower with gridpack as an input

  • Analyze generator level information using NanoGEN files

Creating particle level samples from LHE files

As discussed earlier, LHE files itself are not enough to describe physical distributions. In order to generate physics-wise sensible events, LHE files need to go through the parton shower. Parton shower, in principle, is responsible for higher order corrections to the hard process. Dominant contributions of such correction happen with collinear or soft emissions. In CMS, one of the most widely used tool for parton shower is Pythia8 (however, do note that Pythia8 is a multipurpose generator that is able to calculate hard process for certain physics processes). In this exercise, instead of compiling Pythia8 and running it in standalone mode as we did for MadGraph, we will take Pythia8 that is already compiled under CMSSW environment.

(1) Running Pythia8 interface in CMSSW

Make sure the Python virtual environment is deactivated

If you are still using the virtual environment, you need to unset it in order to not interfere with the CMSSW environment.

Let’s first check which release version of Pythia8 we will be using.

cd ~/nobackup/cmsdas_2025_gen/CMSSW_12_4_8/src
cmsenv
scram tool info pythia8

You can find out that we are now using Pythia8.306 version that is already compiled in CMSSW_12_4_8.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Name : pythia8
Version : 306-84a4765a9948f9c1a5e66f80618e2c6d
++++++++++++++++++++

INCLUDE=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/include
LIB=pythia8
LIBDIR=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/lib
PYTHIA8DATA=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/share/Pythia8/xmldoc
PYTHIA8_BASE=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d
ROOT_INCLUDE_PATH=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/include
SYSTEM_INCLUDE+=1
USE=root_cxxdefaults cxxcompiler hepmc3 hepmc lhapdf

Now we will start building our parton shower fragment in our own directories in order to produce samples by ourselves.

mkdir -p Configuration/GenProduction/python/

Create a new file Configuration/GenProduction/python/wplustest.py:

import FWCore.ParameterSet.Config as cms

import os

externalLHEProducer = cms.EDProducer('ExternalLHEProducer',
    args = cms.vstring(os.getenv("HOME")+ "/nobackup/cmsdas_2025_gen/genproductions_mg352/bin/MadGraph5_aMCatNLO/wplustest_4f_LO_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz"),
    nEvents = cms.untracked.uint32(5000),
    numberOfParameters = cms.uint32(1),
    outputFile = cms.string('cmsgrid_final.lhe'),
    scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh'),
    generateConcurrently = cms.untracked.bool(False),
)

from Configuration.Generator.Pythia8CommonSettings_cfi import *
from Configuration.Generator.MCTunesRun3ECM13p6TeV.PythiaCP5Settings_cfi import *

generator = cms.EDFilter("Pythia8HadronizerFilter",
    PythiaParameters = cms.PSet(
        pythia8CommonSettingsBlock,
        pythia8CP5SettingsBlock,
        parameterSets = cms.vstring(
            'pythia8CommonSettings',
            'pythia8CP5Settings',
        )
    ),
    comEnergy = cms.double(13600),
    maxEventsToPrint = cms.untracked.int32(1),
    pythiaHepMCVerbosity = cms.untracked.bool(False),
    pythiaPylistVerbosity = cms.untracked.int32(1),
)

Let’s compile.

scram b

cmsDriver.py executable makes the full configuration file based on the optional arguments it is given with (data tier, campaign, etc.) using the parton shower fragment that is built. We will create NanoGEN files that are flat ntuples that resembles the NanoAOD data tier but only stored with generator-level information related branches. It skips the SIM and RECO steps in the middle which makes it convenient to do generator-level studies. For more information, take a look at link.

cmsDriver.py Configuration/GenProduction/python/wplustest.py \
    --python_filename config.py \
    --eventcontent NANOAOD \
    --datatier NANOAOD \
    --fileout file:wplustest.root \
    --conditions auto:mc \
    --step LHE,GEN,NANOGEN \
    --no_exec \
    --mc \
    -n 100

You just created config.py that can be executed with cmsRun command. Take a look at config.py with less, how it evolved from Configuration/GenProduction/python/wplustest.py through cmsDriver.py. It will produce LHE files, run parton shower to make GEN samples, and then finally convert it to NanoGEN format in one go by doing below. Note that we will only test with 100 events (-n 100) due to time constraints.

cmsRun config.py

LHE files are first produced using the gridpack we’ve just produced.

   ______________________________________     
         Running Generic Tarball/Gridpack     
   ______________________________________     
gridpack tarball path = /uscms/home/enibigir/nobackup/cmsdas_2025_gen/genproductions_mg352/bin/MadGraph5_aMCatNLO/wplustest_4f_LO_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 thread count requested = 1
%MSG-MG5 residual/optional arguments =
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 number of cpus = 1
%MSG-MG5 SCRAM_ARCH version = el8_amd64_gcc10
%MSG-MG5 CMSSW version = CMSSW_12_4_8
Running MG5_aMC for the 1 time
produced_lhe  0 nevt  100 submitting_event  100  remaining_event  100
run.sh 100 2345670
Now generating 100 events with random seed 2345670 and granularity 1

Reweight with additional PDF sets given for possible systematic sources.

INFO: #***************************************************************************
#
# original cross-section: 30775.0
#     scale variation: +11.8% -12.7%
#     emission scale variation: + 0% - 0%
#     central scheme variation: +8.43e-09% -20.3%
# PDF variation: +0.918% -0.918%
#
#PDF NNPDF31_nnlo_as_0118_nf_4: 30776.1 +0.916% -0.916%
#PDF NNPDF30_nnlo_nf_4_pdfas: 29939.4 +1.81% -1.81%
#PDF NNPDF40_nnlo_nf_4_pdfas: 31022.5 +0.554% -0.554%
#PDF MSHT20nnlo_nf4: 30286.6 +1.2% -1.56%
#PDF PDF4LHC21_40_pdfas_nf4: 30529 +1.53% -1.53%
#PDF ABMP16_4_nnlo: 30385.1 +0.885% -0.885%
# dynamical scheme # 1 : 28597.7 +13.2% -14.3% # \sum ET
# dynamical scheme # 2 : 28599.7 +13.2% -14.3% # \sum\sqrt{m^2+pt^2}
# dynamical scheme # 3 : 24520.5 +16.6% -17.8% # 0.5 \sum\sqrt{m^2+pt^2}
# dynamical scheme # 4 : 30775 +11.8% -12.7% # \sqrt{\hat s}
# PDF 42930 : 30365.485849169454
#***************************************************************************

And then Pythia8 is launched with the LHE file created given as an input. It first prints out the LHE information as we saw directly in the LHE file.

  --------  PYTHIA Event Listing  (hard process)  -----------------------------------------------------------------------------------

    no         id  name            status     mothers   daughters     colours      p_x        p_y        p_z         e          m
     0         90  (system)           -11     0     0     0     0     0     0      0.000      0.000      0.000  13000.000  13000.000
     1       2212  (p+)               -12     0     0     3     0     0     0      0.000      0.000   6500.000   6500.000      0.938
     2       2212  (p+)               -12     0     0     4     0     0     0      0.000      0.000  -6500.000   6500.000      0.938
     3         -1  (dbar)             -21     1     0     5     0     0   501     -0.000      0.000      0.771      0.771      0.000
     4          2  (u)                -21     2     0     5     0   501     0      0.000     -0.000  -2136.814   2136.814      0.000
     5         24  (W+)               -22     3     4     6     7     0     0      0.000      0.000  -2136.043   2137.585     81.187
     6        -13  mu+                 23     5     0     0     0     0     0     11.363     36.276  -1442.911   1443.412      0.106
     7         14  nu_mu               23     5     0     0     0     0     0    -11.363    -36.276   -693.132    694.173      0.000
                                   Charge sum:  1.000           Momentum sum:      0.000      0.000  -2136.043   2137.585     81.187

 --------  End PYTHIA Event Listing  -----------------------------------------------------------------------------------------------

Starts the parton shower on top of the given LHE event. See how much more information gets printed out. Remember that parton shower goes lower and lower from the hard process until certain energy threshold (q -> q g -> q g g g -> q q q g g -> ...).

  --------  PYTHIA Event Listing  (complete event)  ---------------------------------------------------------------------------------

    no         id  name            status     mothers   daughters     colours      p_x        p_y        p_z         e          m
     0         90  (system)           -11     0     0     0     0     0     0      0.000      0.000      0.000  13000.000  13000.000
     1       2212  (p+)               -12     0     0    90     0     0     0      0.000      0.000   6500.000   6500.000      0.938
     2       2212  (p+)               -12     0     0    91     0     0     0      0.000      0.000  -6500.000   6500.000      0.938
     3         -1  (dbar)             -21     6     0     5     0     0   501     -0.000      0.000      0.771      0.771      0.000
     4          2  (u)                -21     7     7     5     0   501     0      0.000     -0.000  -2136.814   2136.814      0.000
     5         24  (W+)               -22     3     4     8     8     0     0      0.000      0.000  -2136.043   2137.585     81.187
     6         21  (g)                -41    10     0     9     3   502   501      0.000      0.000      1.719      1.719      0.000
     7          2  (u)                -42    11    11     4     4   501     0     -0.000     -0.000  -2136.814   2136.814      0.000
     8         24  (W+)               -44     5     5    12    12     0     0    -19.744    -26.752  -1604.300   1606.697     81.187
     9          1  (d)                -43     6     0    13    13   502     0     19.744     26.752   -530.795    531.836      0.330
    10         -4  (cbar)             -41    18     0    14     6     0   501      0.000      0.000      2.947      2.947      0.000
    11          2  (u)                -42    19    19     7     7   501     0      0.000      0.000  -2136.814   2136.814      0.000
    12         24  (W+)               -44     8     8    20    20     0     0     -1.064    -19.028  -1221.827   1224.670     81.187
    13          1  (d)                -44     9     9    17    17   502     0     27.853     30.104   -681.041    682.274      0.330
    14         -4  (cbar)             -43    10     0    15    16     0   502    -26.789    -11.076   -231.000    232.816      1.500
    15         -4  (cbar)             -51    14     0    22    22     0   503    -19.016     -8.750   -120.727    122.537      1.500
    16         21  (g)                -51    14     0    23    23   503   502     -5.668     -0.052   -161.724    161.823      0.000
    17          1  (d)                -52    13    13    21    21   502     0     25.748     27.830   -629.590    630.730      0.330
    18         -4  (cbar)             -41    25     0    24    10     0   504      0.000      0.000      3.067      3.067      0.000
    19          2  (u)                -42    26    26    11    11   501     0      0.000      0.000  -2136.814   2136.814      0.000
    20         24  (W+)               -44    12    12    27    27     0     0     -0.639    -17.937  -1205.912   1208.775     81.187
    21          1  (d)                -44    17    17    28    28   502     0     25.919     28.268   -639.604    640.753      0.330

After 1 event information is printed out, 100 events get processed and finally reports the cross section.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Overall cross-section summary
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process		xsec_before [pb]		passed	nposw	nnegw	tried	nposw	nnegw 	xsec_match [pb]			accepted [%]	 event_eff [%]
0		3.078e+04 +/- 2.327e+02		100	100	0	100	100	0	3.078e+04 +/- 2.327e+02		100.0 +/- 0.0	100.0 +/- 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total		3.078e+04 +/- 2.327e+02		100	100	0	100	100	0	3.078e+04 +/- 2.327e+02		100.0 +/- 0.0	100.0 +/- 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Before matching: total cross section = 3.078e+04 +- 2.327e+02 pb
After matching: total cross section = 3.078e+04 +- 2.327e+02 pb
Matching efficiency = 1.0 +/- 0.0   [TO BE USED IN MCM]
Filter efficiency (taking into account weights)= (100) / (100) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (100) / (100) = 1.000e+00 +- 0.000e+00    [TO BE USED IN MCM]

After filter: final cross section = 3.078e+04 +- 2.327e+02 pb
After filter: final fraction of events with negative weights = 0.000e+00 +- 0.000e+00
After filter: final equivalent lumi for 1M events (1/fb) = 3.249e-02 +- 2.478e-04

=============================================

How did the cross section change after parton shower?

MadGraph reported # original cross-section: 30775.0, 30775pb. After running parton shower with Pythia8, same cross section 30780pb is kept. Parton shower adds more and more vertices, but why does the cross section remain unchanged?

Solution

Parton shower is unitary. Sum of probability to branch (e.g q -> q g) and not branch is 1. Hence, the cross sections is determined by the lowest order input (hard process).

Bonus: How did the distribution change?

(2) Jet merging samples

Hard process calculation has advantage in modeling of hard jets and heavy particle decays while parton shower is great for describing collinear and soft emissions. For more realistic and reliable physics modeling of hard jets, for example in W+jet events, MadGraph can be used as below.

generate p p > w+, w+ > ell+ vl @0
add process p p > w+ j, w+ > ell+ vl @1

With such syntaxes, MadGraph produces W+jet process with 0 and 1 hard jet in the event. If this sample goes through parton shower, as some portion of events (dentoed with @1) readily involves hard jet, it would be better at describing W+jet process with hard jet. However consider the event @0 emitting QCD particles from initial state radiation that could possibly form a jet that is hard enough. Such phase space inherently possesses a problem of double counting as “W+jet with hard jet” event could come from both @0 and @1. To mitigate such issues and remove double counting of phase space contributions, jet merging technique is used. Jet merging is set up with an artificial cut threshold called jet merging scale. This scale decides whether an event will be accepted or not from both @0 and @1. Finally, only accepted events from the two processes will be merged and form one sample. Very roughly, jet merging scale can be thought as the momentum of a jet. If a jet in the event is hard enough above the threshold, events from @0 are rejected while only accepting from @1. On the other hand, if a jet in the event is not too hard below the threshold, events from @0 are only accepted while rejecting @1.

How to produce gridpack

How to set the Madgraph (run card) and Pythia (fragment)?

Hint

#*********************************************************************
# Matching - Warning! ickkw > 1 is still beta
#*********************************************************************
 0        = ickkw            ! 0 no matching, 1 MLM, 2 CKKW matching

This flag tells MadGraph that the LHE files we are going to produce will later be going through jet merging in > > order to avoid double countings.

#*********************************************************************
# Jet measure cuts                                                   *
#*********************************************************************
 0   = xqcut   ! minimum kt jet measure between partons

When jet merging is turned on, xqcut needs to be set which presample the events for efficient jet merging. Remember that some portion of events will be later discarded and never going to be used. So there is no point of producing events that involve jets with too low energy scale at this LHE level since these will likely be removed.

generator = cms.EDFilter("Pythia8HadronizerFilter",
    PythiaParameters = cms.PSet(
        pythia8CommonSettingsBlock,
        pythia8CP5SettingsBlock,
        processParameters = cms.vstring(
            'JetMatching:setMad = off',
            'JetMatching:scheme = 1',
            'JetMatching:merge = on',
            'JetMatching:jetAlgorithm = 2',
            'JetMatching:etaJetMax = 5.',
            'JetMatching:coneRadius = 1.',
            'JetMatching:slowJetPower = 1',
            'JetMatching:doShowerKt = off', 
            'JetMatching:qCut = 19.',
            'JetMatching:nQmatch = 4',
            'JetMatching:nJetMax = 1',
            'TimeShower:mMaxGamma = 4.0'
        ),
        parameterSets = cms.vstring(
            'pythia8CommonSettings',
            'pythia8CP5Settings',
            'processParameters',
        )
    ),

Bonus: What is the cross section?

Key Points

  • Pythia8 is the main tool used for parton showering in CMS

  • Events are not physical if it did not go through parton shower

  • Jet merging is a technique to avoid double countings of jet phase spaces in ME and PS calculations


3 - Analysis and systematic uncertainties

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • How can we use generator information for quick studies?

  • What are systematic uncertainties coming from theory inputs?

Objectives
  • Use NanoGEN for exploratory studies and to gain experience with generator related uncertainties (PDF choice, scale, strong coupling constant)

  • Study some theory related systematic uncertainties

Analysis of NanoGEN

In the following we will explore a bit the content of NanoGEN samples, and how they can be used for doing first studies for a potentially interesting physics analysis. The NanoGEN sample we’ve previously created contains several trees. You can open the file in root to explore the content, e.g.

cd ~/nobackup/cmsdas_2025_gen/CMSSW_12_4_8/src; cmsenv
root -l wplustest.root

Within root, you can use .ls to list the content. If you’re interested in the branches contained in a certain tree you can

.ls
Events->GetListOfBranches()->ls()

The NanoGEN sample contains, amongst others, collections of all the generated particles (GenPart), jets with different cone size parameters (GenJet, GenJetAK8), generated missing transverse momentum (GenMET) and generated leptons (GenDressedLepton). In the exercise we will be using GenMET and GenDressedLepton to calculate the transverse mass of the lepton+MET system, MT, which is often used in analysis with leptonically decaying W bosons.

Where do systematic uncertainties come from?

Several choices of parameters and settings in the event generators have an impact on the predicted overall event yield as well as the acceptance and object kinematics. The most prominent examples are the choice of the parton distribution function (PDF), the chosen values of the renormalization and factorization scales and the value of the strong coupling constant.

Estimating systematic uncertainties

Several different PDF sets are usually stored in samples that are centrally produced in CMS. In order to save space, only one PDF set is stored in NanoAOD and NanoGEN. For the UL campaign, the NNPDF3.1 NNLO PDF set is used (LHAPDF), which contains 103 per-event weights. The stored weights called LHEPdfWeight are ratios w.r.t. the central value, therefore, the weight with index 0 should usually be 1 (if not it means that different PDFs were used for the central ME and the computed variations and extra care needs to be taken). Indices 1-100 are different, linear independent PDF sets that can be used to estimate systematic uncertainties, while indices 101 and 102 correspond to the up and down variations of the strong coupling constant.

Hessian and MC replicas

Two different approaches for PDF sets exist: MC replicas and hessian eigenvectors. While the hessian sets used in CMS in (most) UL samples allows for estimation of the total uncertainty by using the squared sum of the individual variations, the situation is more ambigous for MC replicas.

Similar sets of variations are available for the renormalization and factorization scales, LHEScaleWeight. You can find the details about the variations in the branch documentation within the root file (see above on how the get the list of branches in root).

Optional (MadSpin and BSM UFO model)

Using MadSpin

Why is MadSpin in any case useful? The answer lies in NLO calculations in QCD or loop-induced processes. Let’s launch MadGraph prompt shell again.

cd ~/nobackup/cmsdas_2025_gen/MG5_aMC_v3_5_2/
./bin/mg5_aMC

Now try making another simple example that is top pair production.

import model sm
generate p p > t t~ [QCD]

It would be not so difficult to realize [QCD] has been added in the process definition. This is a flag which tells MadGraph that you wish to do the calculations at NLO in QCD.

Before going further, try concatenating top decays into a W boson and a b quark similar to what we did for Z -> ee example.

generate p p > t t~, t > w+ b [QCD]
generate p p > t t~ [QCD], t > w+ b
exit

You will find neither of these working and instead MadGraph complains with an error log saying str : Decay processes cannot be perturbed. So it means that physics processes with decays of particles are are not possible for NLO calculations. This is where MadSpin becomes necessary, for such cases where resonant particle cannot be decayed can be decayed using MadSpin. Now lets get back to the working example to see how it works.

import model sm
generate p p > t t~ [QCD]
output TopPair
launch
shower = PYTHIA8
4
0

Two lines are noticably added, shower = PYTHIA8 and 4 (which can be replaced with madspin = ON). We are again not going to do the parton shower here. This is because depending on which parton shower generator one chooses later, “counter term” calculation differs which accounts as negatively weighted events.

Negative weighted events

We won’t cover what it is in the tutorial but important things to remember are that

  1. Some portion of the events are negatively weighted so one needs to be careful with the normalization.
  2. LHE files at NLO are even more unphysical than LHE files at LO before parton shower.

Press tab to turn off timer. MadGraph again asks if you would like to edit the cards now including madspin_card.dat.

/------------------------------------------------------------\
|  1. param   : param_card.dat                               |
|  2. run     : run_card.dat                                 |
|  3. madspin : madspin_card.dat                             |
\------------------------------------------------------------/

If you take a look at the run_card.dat, you might notice that the template for it is quite different from when we did DY at LO. Template for NLO is shown in (link)[https://github.com/cms-PdmV/GridpackFiles/blob/master/Campaigns/Run3Summer22/MadGraph5_aMCatNLO/Templates/NLO_run_card.dat] and for LO is shown in (link)[https://github.com/cms-PdmV/GridpackFiles/blob/master/Campaigns/Run3Summer22/MadGraph5_aMCatNLO/Templates/LO_run_card.dat]. Although MadGraph shares the same user interface, LO and NLO calculations run on totally different codes in the backend. So NLO type run_card.dat does not work for LO calculations and vice versa.

Now take a look at madspin_card.dat by pressing 3.

# specify the decay for the final state particles
decay t > w+ b, w+ > all all
decay t~ > w- b~, w- > all all
decay w+ > all all
decay w- > all all
decay z > all all

This card lets you define how you want your resonant particles to decay. For example, if you do :

decay t > w+ b, w+ > e+ ve
decay t~ > w- b~, w- > mu- vm~

This forces top to decay into positron and antitop to decay into muon. Remove unnecessary decay definitions and add these two lines to make a top pair sample that ends up giving you positron and a muon. Before moving on, do set run_card nevents 50 to save time, producing only 50 events.

You will see inclusive top pair production cross section being computed which includes all possible decays for the top quark.

   --------------------------------------------------------------
      Summary:
      Process p p > t t~ [QCD]
      Run at p-p collider (6500.0 + 6500.0 GeV)
      Number of events generated: 50
      Total cross section: 6.847e+02 +- 4.3e+00 pb
   --------------------------------------------------------------

And then you will see MadSpin doing its job, decaying the top quarks to desired channels.

************************************************************
*                                                          *
*           W E L C O M E  to  M A D S P I N               *
*                                                          *
************************************************************

...

INFO: decay channels for t : ( width = 1.4915 GeV ) 
INFO:        BR                 d1  d2 
INFO:    1.000000e+00            b  w+  
INFO:    
INFO:    
INFO: decay channels for w+ : ( width = 2.04793 GeV ) 
INFO:        BR                 d1  d2 
INFO:    3.333610e-01            d~  u  
INFO:    3.333610e-01            s~  c  
INFO:    1.111195e-01            e+  ve  
INFO:    1.111195e-01            mu+  vm  
INFO:    1.110390e-01            ta+  vt  
INFO:    
INFO:    
INFO: decay channels for t~ : ( width = 1.4915 GeV ) 
INFO:        BR                 d1  d2 
INFO:    1.000000e+00            b~  w-  
INFO:    
INFO:    
INFO: decay channels for w- : ( width = 2.04793 GeV ) 
INFO:        BR                 d1  d2 
INFO:    3.333610e-01            d  u~  
INFO:    3.333610e-01            s  c~  
INFO:    1.111195e-01            e-  ve~  
INFO:    1.111195e-01            mu-  vm~  
INFO:    1.110390e-01            ta-  vt~

...

INFO:    Estimating the maximum weight     
INFO:    *****************************     
INFO:      Probing the first 75 events 
INFO:      with 400 phase space points 
INFO:    
INFO: Event 1/75 :  0.068s   
INFO: Event 6/75 :  0.63s   
INFO: Event 11/75 :  1.2s   
INFO: Event 16/75 :  1.8s   
INFO: Event 21/75 :  2.1s   
INFO: Event 26/75 :  3s   
INFO: Event 31/75 :  3.8s   
INFO: Event 36/75 :  4.6s   
INFO: Event 41/75 :  5.7s   
INFO: Event 46/75 :  6.5s   

What is the cross section?

Inclusive cross section was reported to be 684.7pb as we saw above. When considering the decay channels (e+ and mu- final states), what is the proper cross section? What are the branching ratios for w+ > e+ ve and w- > mu- vm~?

Solution

8.5pb (from 684.7 x 11% x 11%)

How can we make a sample that yields mu+, vm, and this time, two quark jets (hadronically decaying w-)

Solution

decay t > w+ b, w+ > mu+ vm
decay t~ > w- b, w- > j j

Interfacing BSM UFO model files

Let’s take a look at how BSM samples for search type of analyses gets produced. We will pick one simple example, a hypothetical heavy gauge boson that is called W’ particle.

import model WEff_UFO
display particles
generate p p > wp+, wp+ > e+ ve
add process p p > wp-, wp- > e- ve~
output WprimeToENu

How can we make the syntax simpler using particle containers?

How can we write generate p p > wp+, wp+ > e+ ve and add process p p > wp-, wp- > e- ve~ in a simpler way?

Solution

define wprime = wp+ wp-
define leptons = e+ e- ve ve~
generate p p > wprime, wprime > leptons leptons

This will find all possible Feynman diagrams with given particle combinations.

As we are missing right-handed interactions for W bosons in the SM, a lot of BSM scenarios predict the W’ boson that is heavier in mass (thus, we couldn’t find it yet) but possesses the ability to interact with right-handed couplings. As we do not know how large the particle’s mass is, we test many different scenarios (BSM parameters), for example, different masses, decay channels, coupling strengths. We will now see how such BSM parameters can be set in MadGraph.

launch
0

And press tab to turn off the timer.

Take a look at the parameter card by hitting 1.

Now you will see there is a clear difference in the parameter settings when compared to the sm model file we’ve been using. Here, we will only be focusing on the mass of W’ MWp and the right-handed coupling strength kR. In addition, you will also need to keep in mind that widths of the W’ wwp should be changing based on how you choose your BSM parameters.

###################################
## INFORMATION FOR MASS
###################################
Block mass
    1 5.040000e-03 # MD
    2 2.550000e-03 # MU
    3 1.010000e-01 # MS
    4 1.270000e+00 # MC
    5 4.700000e+00 # MB
    6 1.720000e+02 # MT
   11 5.110000e-04 # Me
   13 1.056600e-01 # MMU
   15 1.777000e+00 # MTA
   23 9.118760e+01 # MZ
   25 1.250000e+02 # MH
   34 1.000000e+03 # MWp

...

###################################
## INFORMATION FOR WPCOUP
###################################
Block wpcoup
    1 0.000000e+00 # kL
    2 1.000000e+00 # kR

...

###################################
## INFORMATION FOR DECAY
###################################
DECAY   6 1.508336e+00 # WT
DECAY  23 2.495200e+00 # WZ
DECAY  24 2.085000e+00 # WW
DECAY  25 4.070000e-03 # WH
DECAY  34 1.000000e+01 # WWp

You can see that the mass of W’ is now set to 1000GeV, right-handed coupling strength is set to 1.0, and the width of W’ is given with 10GeV. You can change the BSM parameters, maybe mass to 2000GeV and coupling strength to 0.1 by doing below.

set param_card mwp 2000
set param_card kr 0.1

However, if you again take a look at the parameter card, the width of W’ wwp is kept same. You can interactively see how the width value gets computed by doing compute_widths wp+. Check the parameter card again, and you would see that width has changed and also tells you the branching ratios to different channels.

#      PDG        Width
DECAY  34   6.672601e-01
#  BR             NDA  ID1    ID2   ...
   2.506959e-01   2    2  -1 # 0.1672793598579319
   2.479126e-01   2    6  -5 # 0.16542221070326676
   2.379169e-01   2    4  -3 # 0.15875247762227632
   8.356529e-02   2    12  -11 # 0.05575978661997229
   8.356529e-02   2    14  -13 # 0.05575978638653866
   8.356519e-02   2    16  -15 # 0.05575972059211703
   1.277883e-02   2    2  -3 # 0.008526805639865994

Instead of doing interactive width computation, you can do set param_card wwp auto. Then instead of first computing the widths, MadGraph will calculate the widths on-the-fly while generating events (but results will be identical).

Proceed by hitting 0 and see how much cross section it gives you when hypothetically the W’ boson exists and decays to the electron channel, assuming mass 2000GeV with right handed coupling 0.1.

  === Results Summary for run: run_01 tag: tag_1 ===

     Cross-section :   0.001016 +- 1.447e-06 pb
     Nb of events :  10000

How can we check the cross section when mass is 2000GeV with right handed coupling 1.0?

Solution

Repeat the exercise above but this time

set param_card mwp 2000
set param_card kr 1.0

And most importantly, do not forget to compute the width by adding :

set param_card wwp auto

Then you will get the following result.

  === Results Summary for run: run_01 tag: tag_1 ===

    Cross-section :   0.1045 +- 0.0001681 pb
    Nb of events :  10000

How much did the cross section increase compared to the scenario when mass is 2000GeV with right handed coupling 0.1?

How many interactions did the W’ boson get involved in?

Solution

One vertex when producing it, another vertex when it decays to electron channel. Thus two interactions (1./0.1) = 10 gets squared and thus result in 100 times larger cross section.

Key Points

  • PDF uncertainties can be estimated from a set of different per-event weights. The method depends on the type of PDF set that is used (hessian, MC replicas)

  • Scale and alphaS variations are another source of uncertainty in the prediction of a simulated sample and can be used to estimate systematic uncertainties


4 - CMS resources for samples and generators

Overview

Teaching: 5 min
Exercises: 10 min
Questions
  • How do I find centrally produced samples and their status?

  • How do I obtain a cross section to normalize my sample?

Objectives
  • Leverage available tools for efficient analysis work

CMS resources for simulated samples

Get configurations for a certain sample from McM. E.g. you want the inclusive W+jets sample, start from a DAS query (requires a valid grid certificate / proxy):

dasgoclient -query="/WJetsToLNu*/RunIISummer20UL18*/MINIAODSIM"

Recall

Instructions to set up and verify your grid certificate have been covered in pre-exercises.

Alternatively there’s also a web-based DAS client: https://cmsweb.cern.ch/das/, use dataset=/WJetsToLNu*/RunIISummer20UL18*/MINIAODSIM to perform your search.

/WJetsToLNu_012JetsNLO_34JetsLO_EWNLOcorr_13TeV-sherpa/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v4/MINIAODSIM
/WJetsToLNu_012JetsNLO_34JetsLO_EWNLOcorr_13TeV-sherpa/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1_ext1-v2/MINIAODSIM
/WJetsToLNu_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v2/MINIAODSIM
/WJetsToLNu_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_2J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_2J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-100To200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-100To200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-200To400_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-200To400_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-70To100_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-70To100_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-100To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-100To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-400To600_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-400To600_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-600ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-600ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-PUForTRK_TRK_106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-PUForTRKv2_TRKv2_106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-100to200_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-200toInf_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-200toInf_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM

We want the inclusive LO sample with the latest MiniAOD version (MiniAODv2), hence we pick /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM. Plug this name into ‘‘Output Dataset’’ in McM, then click on the dataset name (WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8). In ‘‘Select View’’ check ‘‘Fragment’’ and click on the expand icon under ‘‘Fragment’’ (rightmost column) for the request with a Summer20UL18wmLHEGS PrepId. You can also filter the results directly by appending ?dataset_name=WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8&prepid=*Summer20UL18wmLHE* to the requests address, https://cms-pdmv.cern.ch/mcm/requests.

Status of samples

GrASP is tool to conveniently track the status of your samples. Just select the campaigns you’re interested in (e.g. Run2 UL or Run3) and type the sample name. You can also tag samples of your analysis so that they are easier to find and keep track of.

Cross sections

CMSSW analyzer

In the following, we will use a CMSSW analyzer called GenXSecAnalyzer to compute the cross section of samples. The analyzer takes a list of EDM files as input (i.e., no NanoAOD or NanoGEN). Make sure you are in a CMSSW environment

cd ~/nobackup/cmsdas_2025_gen/CMSSW_12_4_8/src/
cmsenv

You can then use the prepared configurations to obtain the cross section for a sample of your liking, e.g. /TTWJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-madspin-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM

Open your favorite editor and create a new file xsec_ana.py:

import FWCore.ParameterSet.Config as cms
from FWCore.ParameterSet.VarParsing import VarParsing
options = VarParsing ('analysis')
options.parseArguments()

process = cms.Process('ANA')

# import of standard configurations
process.load('Configuration.StandardSequences.Services_cff')
process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi')
process.load('FWCore.MessageService.MessageLogger_cfi')
process.load('Configuration.EventContent.EventContent_cff')
process.load('SimGeneral.MixingModule.mixNoPU_cfi')
process.load('Configuration.StandardSequences.GeometryRecoDB_cff')
process.load('Configuration.StandardSequences.MagneticField_38T_cff')
process.load('Configuration.StandardSequences.Generator_cff')
process.load('IOMC.EventVertexGenerators.VtxSmearedRealistic8TeVCollision_cfi')
process.load('GeneratorInterface.Core.genFilterSummary_cff')
process.load('Configuration.StandardSequences.EndOfProcess_cff')
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')

from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:mc', '')
process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(options.maxEvents)
)

process.MessageLogger.cerr.FwkReport.reportEvery = 10000

# Handle different input options
inputFiles=[]
if len(options.inputFiles)==1 and (".root" not in options.inputFiles[0]):
    flist = open(options.inputFiles[0])
    inputFiles = flist.readlines()
    flist.close()
else:
    inputFiles = options.inputFiles

process.source = cms.Source(
    "PoolSource",
    fileNames  = cms.untracked.vstring(inputFiles),
    duplicateCheckMode = cms.untracked.string('noDuplicateCheck')
)


process.dummy2 = cms.EDAnalyzer("GenXSecAnalyzer")


# Path and EndPath definitions
process.ana = cms.Path(process.dummy2)
# Schedule definition
process.schedule = cms.Schedule(process.ana)

Run the configuration file with:

dasgoclient -query="file dataset=/TTWJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-madspin-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM | grep file.name" > myfiles.txt
cmsRun xsec_ana.py inputFiles_load=myfiles.txt maxEvents=100000

In this example we restrict the maximum number of events to 100k. This will give us a large enough sample for a reliable result, without running too long (the sample has 10.5M events, you can use DAS to verify this number with dasgoclient -query="summary dataset=DATASET).

The inputFiles option takes a range of options:

Questions:

xsec DB

A central database is kept with approved x-secs for centrally produced samples, XSDB.

The CMS Generator’s group Cross Section Database Tool (XSDB) is a tool for storing and looking up information related to a specific MC sample witihin CMS. This tool is designed to complement DAS and MCM, with a direct link from DAS being available to a specific sample. Anyone with a CERN login can view the XSDB and perform queries for sample information. However, further action is restricted by e-group permissions. There exist a user’s, approver’s, and admin e-groups. The XSB users are CMS individuals that have permission to insert and modify documents for XSDB. Approvers have the same user privileges as users, but are primarily tasked with approving records submitted by users. The admins have the responsibility of maintaining and improving the tool for future use.

There is a large amount of information that can be stored in the database for each sample. This information includes: cross section value, cross section uncertainty, hadronization tool, matrix element generator, sample contact, cuts used, DASprimary dataset name, and MCM prename, among other metadata. This information can then be used to help with analysis. In this exercise, we will simply try some searching through XSDB for a sample, looking at some information stored there and getting familiar with the XSDB.

We would like to search for a sample within XSDB. We’ll look for an EXO sample used in the Contact Interaction qqbar to dimuon channel in the search for compositeness. The sample can be found in DAS with the dataset name: /CITo2Mu_M1300_CUETP8M1_Lam10TeVDesRR_13TeV-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

It is possible to search for a substring of the item that one would like to look for. It is important to note that wildcards are supported, however as long as the string is contiguous, it will be accepted by the XSDB query. XSDB also supports boolean queries. If we want to query the database for our original sample we could use the following: process_name=cito2mu && total_uncertainty=21.42 You can also query for your favorite MC sample. The XSDB twiki can be found here: XSDB twiki.

Key Points

  • DAS can be used to find samples and their files, number of events for a certain sample etc

  • McM is used for sample generation management, and can be used by the user to obtain additional information about their samples, e.g. the root gridpack, fragments etc.

  • McM is also a good place to look for example cmsDriver commands

  • Different sources for x-secs exist within CMS: a CMSSW analyzer and a database