1 - Matrix Element Generator
Overview
Teaching: 20 min
Exercises: 40 minQuestions
What are Monte Carlo Generators?
Why are we using simulated samples in CMS?
How are simulated samples created in CMS?
Objectives
Use the MadGraph generator in standalone mode and get familiar with the basic syntax
Analyze the produced LHE files
Introduction and first steps
Although quite old, link is a great reading material to get a general overview of Monte Carlo event generators. Monte Carlo event generators are essential components of almost all experimental analyses and are also widely used by theorists and experiments to make predictions and preparations for future experiments. It is one of the topics where we CMS experimentalists and theorists have the closest connections to, theorists give us predictions and experimentalists verify them with the actual data. Although Monte Carlo event generators are extremely important tools in HEP, they are often used as black boxes which we more or less treat them as “data”. Our aim is to get the minimal background of how these tools are working and analyze them using the generator level information.
Samples that are used by CMS experiments go through several steps of simulation :
- Monte Carlo event generator
- Detector simulation
- Pileup mixing
- Trigger emulation
- Object econstruction
We focus on “1. Monte Carlo event generator” in this tutorial. Monte Carlo event generator can be further divided into several subpieces as each steps can be factorized and can be handled through separate calculations :
- Parton distribution function (PDF)
- Hard scattering (matrix element calculation)
- Parton shower & hadronization First of all, LHC is a proton-proton collider, hence we need information on how partons (quarks and gluons) are distributed in the proton (PDF). Hard scattering is the part where calculations can be treated perturbatively, interactions of incoming partons with the largest momentum transfer (usually the physics process we are interested in). Parton shower & hadronization further describes how the particles involed in the hard scattering evolve, working downwards to lower momentum scales even to a point where perturbative calculations break down.
Using Standalone Madgraph
In the first part of the exercise, we will use the matrix element generator MadGraph5 _aMC@NLO, or in short MadGraph link.
MadGraph can perform the calculations for many different physics processes (both SM and BSM) at leading and next-to-leading order (LO & NLO) in QCD.
Because of its easy user interface and flexibility with UFO models, you can test wide variety of physics modeling.
We will now first see how MadGraph runs interactively in standalone mode using simple W+
(wplus) process as an example.
We will first use the interactive prompt of MadGraph to generate proton proton collision events that produce W bosons.
First, log in to a new session on the LPC cluster (ssh -Y <USERNAME>@cmslpc-el8.fnal.gov
).
Make sure you have completed the setup steps!
Then, start the interactive prompt of Madgraph:
cd ~/nobackup/cmsdas_2025_gen/MG5_aMC_v3_5_2/
./bin/mg5_aMC
Madgraph is configured and steered through text-based cards.
The process definitions can be stored in a card called proc_card.dat
.
You can look at an example using the following command:
!cat wplustest_4f_LO_proc_card.dat
Note: the exclamation mark is used to execute shell commands within Madgraph, e.g. !cat
in the above example.
import model sm-ckm
#switch to diagonal ckm matrix if relevant for speed
#import model sm-lepton_masses
define ell+ = e+ mu+ ta+
define ell- = e- mu- ta-
generate p p > w+, w+ > ell+ vl @0
output wplustest_4f_LO -nojpeg
Copy/paste the commands line-by-line and pay attention to the output.
The two most important lines of this block are the model import (import model sm-ckm
) and the instructions on the process to generate (generate p p > w+, w+ > ell+ vl
).
Within the MG directory you can find a directory models
, that contains different pre-installed models.
The most obvious one that we are using in the example is sm
- the standard model at leading order in perturbative QCD.
Model parameters can be configured through ‘‘restriction cards’’, in this example restrict_ckm.dat
loaded through the syntax sm-ckm
.
This specific restriction card uses a non-diagonal CKM matrix (diagonal CKM is the default otherwise for simplification and faster running).
One great feature of MadGraph is it’s flexibility in terms of physics models to use.
To generate a sample using a new physics model one can use the UFO interface. A database of models can be found in the feynrules model database.
The practically most relevant part is that MG figures out all relevant Feynman diagrams contributing to a process.
If you are trying to set up a new MC sample, looking at these Feynman diagrams is a great way to check that you actually get the physics you want.
To check them out you can open the individual plots in e.g. wplustest_4f_LO/SubProcesses/P0_qq_wp_wp_lvl/matrix*.ps
with gv
, display
or evince
.
You can also use the ps2pdf
program to convert the post script files into PDFs.
Alternatively, remove -nojpeg
from the output line and look at the diagrams in jpeg format using display
.
Now that Madgraph has figured out the feynman diagrams you can start the actual computation within the MG5 prompt with
launch
Hint: if you closed the interactive MG session for some reason you can still launch without rerunning the previous commands with
./bin/mg5_aMC
launch wplustest_4f_LO
Madgraph will ask you a few more questions. Press tab
to turn off the timer (otherwise, MadGraph will move on by itself after 60 seconds).
/===========================================================================\
| 1. Choose the shower/hadronization program shower = Not Avail. |
| 2. Choose the detector simulation program detector = Not Avail. |
| 3. Choose an analysis package (plot/convert) analysis = Not Avail. |
| 4. Decay onshell particles madspin = OFF |
| 5. Add weights to events for new hypp. reweight = Not Avail. |
\===========================================================================/
The first one you can just skip by pressing <RETURN>. As we did not install any other shower
, detector
, analysis package
, they are in Not Avail.
state.
Do you want to edit a card (press enter to bypass editing)?
/------------------------------------------------------------\
| 1. param : param_card.dat |
| 2. run : run_card.dat |
\------------------------------------------------------------/
you can also
- enter the path to a valid card or banner.
- use the 'set' command to modify a parameter directly.
The set option works only for param_card and run_card.
Type 'help set' for more information on this command.
- call an external program (ASperGE/MadWidth/...).
Type 'help' for the list of available command
[0, done, 1, param, 2, run, enter path][90s to answer]
Let’s take a look at the param card
and see how the values are set, press 1
and ENTER
(<RETURN>) to investigate the parameter settings.
###################################
## INFORMATION FOR MASS
###################################
Block mass
5 4.700000e+00 # MB
6 1.730000e+02 # MT
15 1.777000e+00 # MTA
23 9.118800e+01 # MZ
25 1.250000e+02 # MH
...
###################################
## INFORMATION FOR DECAY
###################################
DECAY 6 1.491500e+00 # WT
DECAY 23 2.441404e+00 # WZ
DECAY 24 2.047600e+00 # WW
DECAY 25 6.382339e-03 # WH
Let’s take a look at the run card
and see how the values are set, press 2
and ENTER
(<RETURN>) to investigate the run settings.
#*********************************************************************
# Number of events and rnd seed *
# Warning: Do not generate more than 1M events in a single run *
#*********************************************************************
10000 = nevents ! Number of unweighted events requested
0 = iseed ! rnd seed (0=assigned automatically=default))
...
#*********************************************************************
# Collider type and energy *
# lpp: 0=No PDF, 1=proton, -1=antiproton, *
# 2=elastic photon of proton/ion beam *
# +/-3=PDF of electron/positron beam *
# +/-4=PDF of muon/antimuon beam *
#*********************************************************************
1 = lpp1 ! beam 1 type
1 = lpp2 ! beam 2 type
6500.0 = ebeam1 ! beam 1 total energy in GeV
6500.0 = ebeam2 ! beam 2 total energy in GeV
...
#*********************************************************************
# Standard Cuts *
#*********************************************************************
# Minimum and maximum pt's (for max, -1 means no cut) *
#*********************************************************************
10.0 = ptl ! minimum pt for the charged leptons
-1.0 = ptlmax ! maximum pt for the charged leptons
{} = pt_min_pdg ! pt cut for other particles (use pdg code). Applied on particle and anti-particle
{} = pt_max_pdg ! pt cut for other particles (syntax e.g. {6: 100, 25: 50})
...
#*********************************************************************
# Minimum and maximum invariant mass for pairs *
#*********************************************************************
0.0 = mmll ! min invariant mass of l+l- (same flavour) lepton pair
-1.0 = mmllmax ! max invariant mass of l+l- (same flavour) lepton pair
{} = mxx_min_pdg ! min invariant mass of a pair of particles X/X~ (e.g. {6:250})
{'default': False} = mxx_only_part_antipart ! if True the invariant mass is applied only
! to pairs of particle/antiparticle and not to pairs of the same pdg codes.
...
#*********************************************************************
# maximal pdg code for quark to be considered as a light jet *
# (otherwise b cuts are applied) *
#*********************************************************************
4 = maxjetflavor ! Maximum jet pdg code
Try editting the beam energy (ebeam1
and ebeam2
) 6500
to 6800
as we are now running at 13.6TeV beam energy.
When done with editting, escape after saving the changes in the text file.
MadGraph allows you to change settings by interactively typing in below as well.
set run_card nevents 5000
Take a look at the run card again and see if number of events to generate (nevents
) is changed to 5000
.
And change it back to 10000
using same command and check again.
As shown above, there are several phase space cuts set by default (e.g. 10.0 = ptl
).
There is a handy command that removes all phase space cuts at once (instead of doing set run_card ptl 0
, set run_card ptj 0
, … one by one by hand).
set no_parton_cut
Take a look at the card again and see if lepton pt cut (ptl
) is changed to 0
.
Keep in mind that the cuts you give before doing set no_parton_cut
will be removed by this command.
So don’t forget to do set no_parton_cut
before giving the cuts you wish to give.
Once you are done, please provide the path to the pre-made run_card: wplustest_4f_LO_run_card.dat
What is the cross section determined by Madgraph?
Obtaining the cross section
Define a process (e.g. from the process card above) and launch.
Solution
launch wplustest_4f_LO
=== Results Summary for run: run_01 tag: tag_1 === Cross-section : 2.715e+04 +- 39.45 pb Nb of events : 10000 INFO: No version of lhapdf. Can not run systematics computation store_events INFO: Storing parton level results INFO: End Parton reweight -from_cards decay_events -from_cards INFO: storing files of previous run INFO: Done
The cross section calculated by MG is
2.715e+04 +- 39.45 pb
.
The main output that MG produces is called ‘‘LHE file’’. The LHE file (Les Houches Event file) is a standard file format that stores process and event information from parton-level event generators. The documentation can be found here.
In general, the LHE file contains a header with description of the settings of the generator (e.g. process and run information), and multiple event blocks (one for each event). The LHE file is plain text, so it’s usually a good idea to use some compression algorithm to save space - MG zips the output by default.
Looking at the LHE output
Find the LHE file produced by MG and find the first event block.
Example solution
Exit MG, then do
find -path './wplustest_4f_LO/*.lhe.gz' gzip -d ./wplustest_4f_LO/Events/run_01/unweighted_events.lhe.gz less ./wplustest_4f_LO/Events/run_01/unweighted_events.lhe
<LesHouchesEvents version="3.0"> <header> ... </header> ... <event> 5 0 +2.7145900e+04 7.93095700e+01 7.54677100e-03 1.33102200e-01 2 -1 0 0 501 0 +0.0000000000e+00 +0.0000000000e+00 +4.5829549845e+01 4.5829549845e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00 -1 -1 0 0 0 501 -0.0000000000e+00 -0.0000000000e+00 -3.4311969734e+01 3.4311969734e+01 0.0000000000e+00 0.0000e+00 1.0000e+00 24 2 1 2 0 0 +0.0000000000e+00 +0.0000000000e+00 +1.1517580112e+01 8.0141519579e+01 7.9309573878e+01 0.0000e+00 0.0000e+00 -13 1 3 3 0 0 -1.6845086581e+01 +2.2368564620e+01 -2.2614075432e+01 3.5993138689e+01 0.0000000000e+00 0.0000e+00 1.0000e+00 14 1 3 3 0 0 +1.6845086581e+01 -2.2368564620e+01 +3.4131655543e+01 4.4148380890e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00 <mgrwt> <rscale> 0 0.79309574E+02</rscale> <asrwt>0</asrwt> <pdfrwt beam="1"> 1 2 0.70507000E-02 0.79309574E+02</pdfrwt> <pdfrwt beam="2"> 1 -1 0.52787646E-02 0.79309574E+02</pdfrwt> <totfact> 0.15019241E+05</totfact> </mgrwt> <rwgt> <wgt id='1'> +2.3707085e+04 </wgt> ... </rwgt> </event>
What does each column mean?
Solution
ID
,status
,mother1
,mother2
,color
,anticolor
,px
,py
,pz
,E
,mass
,life time
, andspin
-11 1 3 3 0 0 -2.3393803385e+01 -7.4187481776e+00 -1.5274153214e+02 1.5470062541e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
This line tells you that a positron (
ID
) is an outgoing particle (status
) with Z as its mother (mother1
andmother2
: 3rd particle is Z which isID=23
) with no color (color
andanticolor
), …
MadGraph syntax
If you want to add another process, e.g. production of W- in the above example, you can add another process with
add process p p > w-, w- > ell- vl~
A detailed introduction to the syntax is given in this documentation. Some very basic things to keep in mind:
generate p p > e+ e-
will generate any diagram that is compatible with the used model that produces an electron / positron pair
generate p p > e+ e- / Z
will exclude diagrams that contain a Z boson as internal paricle
generate p p > e+ e- $ Z
will exclude the Z boson from appearing in the s-channel (careful about gauge invariance)
generate p p > Z > e+ e-
will always include a Z boson in the s-channel (careful about gauge invariance)
generate p p > w+ QED=3
will include all QED contributions, otherwise the QED order is always set to its minimal value
Bonus: Obtaining the cross section of W boson production
The exercise above only contained W+ bosons (only positive charge). Add production of the negatively charged W bosons and calculate the cross section. Before running MG, think about what result you would expect, i.e. by how much do you think the cross section should increase. Compare the results of W+ and W+/-. What do you conclude?
Solution
import model sm-ckm define ell+ = e+ mu+ ta+ define ell- = e- mu- ta- generate p p > w+, w+ > ell+ vl @0 add process p p > w-, w- > ell- vl~ @1 output wtest_4f_LO -nojpeg launch
When prompted about the run_card, use
wplustest_4f_LO_run_card.dat
again.=== Results Summary for run: run_01 tag: tag_1 === Cross-section : 4.667e+04 +- 63.91 pb Nb of events : 10000
The cross section calculated by Madgraph is
4.732e+04 +- 57.08 pb
. While one would naiively expect the cross section to double by including W- bosons we only get a cross section that is ~40% larger. The simplified explanation is that the initial state protons contain more up valence quarks than down valence quarks.
Using the gridpack workflow
As mentioned previously, interactive running of MG5 is useful for developments and quick tests, but ultimately not practical for large-scale production. To avoid having to use the interactive mode, one can make use the card structure of MG. A fully automated workflow for running MG and producing gridpacks is maintained in the genproductions repository. A gridpack is simply an archive file that contains all the executable MG5 code needed to produce LHE events for a given process, which can then be executed easily on many different grid workers (hence the name). It has the advantage that once it is created, it is a one-button program to generate events, no thinking required. In this part of the exercise, we will use the same input cards as before to create a gridpack, run it, and compare the results to before.
Gridpacks are generated using the gridpack_generation script, which we will run in local mode, i.e. on the machine we are currently logged in to. Note that scripts are provided to run the gridpacks on other computing infrastructures such as the CERN batch system and CMSConnect, which is useful for more complicated processes.
To create a gridpack, we simply call gridpack_generation.sh and pass the process name and card location to it.
Setting and unsetting the CMSSW environment
You should not have activated a CMSSW environment in this exercise so far. However, if you did so before, you need to unset it in order to not interfere with the genproductions script. You can run the following command to unset the CMSSW environment, or log in to a clean new session.
eval `scram unsetenv -sh`
We will be generating a gridpack with cards similar to the commands we’ve used in the standalone example above. The cards are located in the MG section of the genproductions directory
cd ~/nobackup/cmsdas_2025_gen/genproductions_mg352/bin/MadGraph5_aMCatNLO
time ./gridpack_generation.sh wplustest_4f_LO cards/examples/wplustest_4f_LO local
Naming conventions
There are certain naming conventions for the input cards. For a given process name $NAME, the input cards must be named as $NAME_run_card.dat, $NAME_proc_card.dat, etc…
mkdir work
cd work
tar xf ../wplustest_4f_LO_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
NEVENTS=10000
RANDOMSEED=12345
NCPU=1
./runcmsgrid.sh $NEVENTS $RANDOMSEED $NCPU
This will produce an output LHE file called cmsgrid_final.lhe
.
Comparing the LHE output
There are multiple ways of analyzing an LHE file, each of which has its own advantages and disadvantages. For the purpose of this exercise, we will use a pre-made pyroot script.
cd ~/nobackup/cmsdas_2025_gen/
cp /eos/uscms/store/user/cmsdas/2025/short_exercises/generators/LHEReader.py .
Convert LHE output to root format
Make sure the Python virtual environment is deactivated
If you are still using the virtual environment, you need to unset it in order to not interfere with the CMSSW environment.
cd CMSSW_12_4_8/src; cmsenv; cd -
python3.9 LHEReader.py --input MG5_aMC_v3_5_2/wplustest_4f_LO/Events/run_01/unweighted_events.lhe --output standalone.root
python3.9 LHEReader.py --input genproductions_mg352/bin/MadGraph5_aMCatNLO/work/cmsgrid_final.lhe --output cmsgrid.root
Feel free to experiment here and plot various quantities. What are the shapes of the lepton pT distributions? What is the shape of the pT distribution of the W system? Are these shapes physical?
Key Points
MadGraph is a widely used tool to generate matrix-element predictions for the hard scatter for SM and BSM processes.
Standalone MadGraph can run interactively on-the-fly or by importing the predefined text scripts
Gridpacks are used for large scale productions with consistency guaranteed
LHE level information is not physical and parton shower is needed to describe full physics
2 - Parton Shower Generator
Overview
Teaching: 10 min
Exercises: 20 minQuestions
Why do we need to do parton showering?
How are simulated samples created in CMS?
Objectives
Perform parton shower with LHE file as an input
Perform parton shower with gridpack as an input
Analyze generator level information using NanoGEN files
Creating particle level samples from LHE files
As discussed earlier, LHE files itself are not enough to describe physical distributions. In order to generate physics-wise sensible events, LHE files need to go through the parton shower. Parton shower, in principle, is responsible for higher order corrections to the hard process. Dominant contributions of such correction happen with collinear or soft emissions. In CMS, one of the most widely used tool for parton shower is Pythia8 (however, do note that Pythia8 is a multipurpose generator that is able to calculate hard process for certain physics processes). In this exercise, instead of compiling Pythia8 and running it in standalone mode as we did for MadGraph, we will take Pythia8 that is already compiled under CMSSW environment.
(1) Running Pythia8 interface in CMSSW
Make sure the Python virtual environment is deactivated
If you are still using the virtual environment, you need to unset it in order to not interfere with the CMSSW environment.
Let’s first check which release version of Pythia8 we will be using.
cd ~/nobackup/cmsdas_2025_gen/CMSSW_12_4_8/src
cmsenv
scram tool info pythia8
You can find out that we are now using Pythia8.306 version that is already compiled in CMSSW_12_4_8
.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Name : pythia8
Version : 306-84a4765a9948f9c1a5e66f80618e2c6d
++++++++++++++++++++
INCLUDE=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/include
LIB=pythia8
LIBDIR=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/lib
PYTHIA8DATA=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/share/Pythia8/xmldoc
PYTHIA8_BASE=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d
ROOT_INCLUDE_PATH=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/include
SYSTEM_INCLUDE+=1
USE=root_cxxdefaults cxxcompiler hepmc3 hepmc lhapdf
Now we will start building our parton shower fragment in our own directories in order to produce samples by ourselves.
mkdir -p Configuration/GenProduction/python/
Create a new file Configuration/GenProduction/python/wplustest.py
:
import FWCore.ParameterSet.Config as cms
import os
externalLHEProducer = cms.EDProducer('ExternalLHEProducer',
args = cms.vstring(os.getenv("HOME")+ "/nobackup/cmsdas_2025_gen/genproductions_mg352/bin/MadGraph5_aMCatNLO/wplustest_4f_LO_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz"),
nEvents = cms.untracked.uint32(5000),
numberOfParameters = cms.uint32(1),
outputFile = cms.string('cmsgrid_final.lhe'),
scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh'),
generateConcurrently = cms.untracked.bool(False),
)
from Configuration.Generator.Pythia8CommonSettings_cfi import *
from Configuration.Generator.MCTunesRun3ECM13p6TeV.PythiaCP5Settings_cfi import *
generator = cms.EDFilter("Pythia8HadronizerFilter",
PythiaParameters = cms.PSet(
pythia8CommonSettingsBlock,
pythia8CP5SettingsBlock,
parameterSets = cms.vstring(
'pythia8CommonSettings',
'pythia8CP5Settings',
)
),
comEnergy = cms.double(13600),
maxEventsToPrint = cms.untracked.int32(1),
pythiaHepMCVerbosity = cms.untracked.bool(False),
pythiaPylistVerbosity = cms.untracked.int32(1),
)
Let’s compile.
scram b
cmsDriver.py
executable makes the full configuration file based on the optional arguments it is given with (data tier, campaign, etc.) using the parton shower fragment that is built.
We will create NanoGEN files that are flat ntuples that resembles the NanoAOD data tier but only stored with generator-level information related branches.
It skips the SIM and RECO steps in the middle which makes it convenient to do generator-level studies.
For more information, take a look at link.
cmsDriver.py Configuration/GenProduction/python/wplustest.py \
--python_filename config.py \
--eventcontent NANOAOD \
--datatier NANOAOD \
--fileout file:wplustest.root \
--conditions auto:mc \
--step LHE,GEN,NANOGEN \
--no_exec \
--mc \
-n 100
You just created config.py
that can be executed with cmsRun
command.
Take a look at config.py
with less
, how it evolved from Configuration/GenProduction/python/wplustest.py
through cmsDriver.py
.
It will produce LHE files, run parton shower to make GEN samples, and then finally convert it to NanoGEN format in one go by doing below.
Note that we will only test with 100 events (-n 100
) due to time constraints.
cmsRun config.py
LHE files are first produced using the gridpack we’ve just produced.
______________________________________
Running Generic Tarball/Gridpack
______________________________________
gridpack tarball path = /uscms/home/enibigir/nobackup/cmsdas_2025_gen/genproductions_mg352/bin/MadGraph5_aMCatNLO/wplustest_4f_LO_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 thread count requested = 1
%MSG-MG5 residual/optional arguments =
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 number of cpus = 1
%MSG-MG5 SCRAM_ARCH version = el8_amd64_gcc10
%MSG-MG5 CMSSW version = CMSSW_12_4_8
Running MG5_aMC for the 1 time
produced_lhe 0 nevt 100 submitting_event 100 remaining_event 100
run.sh 100 2345670
Now generating 100 events with random seed 2345670 and granularity 1
Reweight with additional PDF sets given for possible systematic sources.
INFO: #***************************************************************************
#
# original cross-section: 30775.0
# scale variation: +11.8% -12.7%
# emission scale variation: + 0% - 0%
# central scheme variation: +8.43e-09% -20.3%
# PDF variation: +0.918% -0.918%
#
#PDF NNPDF31_nnlo_as_0118_nf_4: 30776.1 +0.916% -0.916%
#PDF NNPDF30_nnlo_nf_4_pdfas: 29939.4 +1.81% -1.81%
#PDF NNPDF40_nnlo_nf_4_pdfas: 31022.5 +0.554% -0.554%
#PDF MSHT20nnlo_nf4: 30286.6 +1.2% -1.56%
#PDF PDF4LHC21_40_pdfas_nf4: 30529 +1.53% -1.53%
#PDF ABMP16_4_nnlo: 30385.1 +0.885% -0.885%
# dynamical scheme # 1 : 28597.7 +13.2% -14.3% # \sum ET
# dynamical scheme # 2 : 28599.7 +13.2% -14.3% # \sum\sqrt{m^2+pt^2}
# dynamical scheme # 3 : 24520.5 +16.6% -17.8% # 0.5 \sum\sqrt{m^2+pt^2}
# dynamical scheme # 4 : 30775 +11.8% -12.7% # \sqrt{\hat s}
# PDF 42930 : 30365.485849169454
#***************************************************************************
And then Pythia8 is launched with the LHE file created given as an input. It first prints out the LHE information as we saw directly in the LHE file.
-------- PYTHIA Event Listing (hard process) -----------------------------------------------------------------------------------
no id name status mothers daughters colours p_x p_y p_z e m
0 90 (system) -11 0 0 0 0 0 0 0.000 0.000 0.000 13000.000 13000.000
1 2212 (p+) -12 0 0 3 0 0 0 0.000 0.000 6500.000 6500.000 0.938
2 2212 (p+) -12 0 0 4 0 0 0 0.000 0.000 -6500.000 6500.000 0.938
3 -1 (dbar) -21 1 0 5 0 0 501 -0.000 0.000 0.771 0.771 0.000
4 2 (u) -21 2 0 5 0 501 0 0.000 -0.000 -2136.814 2136.814 0.000
5 24 (W+) -22 3 4 6 7 0 0 0.000 0.000 -2136.043 2137.585 81.187
6 -13 mu+ 23 5 0 0 0 0 0 11.363 36.276 -1442.911 1443.412 0.106
7 14 nu_mu 23 5 0 0 0 0 0 -11.363 -36.276 -693.132 694.173 0.000
Charge sum: 1.000 Momentum sum: 0.000 0.000 -2136.043 2137.585 81.187
-------- End PYTHIA Event Listing -----------------------------------------------------------------------------------------------
Starts the parton shower on top of the given LHE event.
See how much more information gets printed out.
Remember that parton shower goes lower and lower from the hard process until certain energy threshold (q -> q g -> q g g g -> q q q g g -> ...
).
-------- PYTHIA Event Listing (complete event) ---------------------------------------------------------------------------------
no id name status mothers daughters colours p_x p_y p_z e m
0 90 (system) -11 0 0 0 0 0 0 0.000 0.000 0.000 13000.000 13000.000
1 2212 (p+) -12 0 0 90 0 0 0 0.000 0.000 6500.000 6500.000 0.938
2 2212 (p+) -12 0 0 91 0 0 0 0.000 0.000 -6500.000 6500.000 0.938
3 -1 (dbar) -21 6 0 5 0 0 501 -0.000 0.000 0.771 0.771 0.000
4 2 (u) -21 7 7 5 0 501 0 0.000 -0.000 -2136.814 2136.814 0.000
5 24 (W+) -22 3 4 8 8 0 0 0.000 0.000 -2136.043 2137.585 81.187
6 21 (g) -41 10 0 9 3 502 501 0.000 0.000 1.719 1.719 0.000
7 2 (u) -42 11 11 4 4 501 0 -0.000 -0.000 -2136.814 2136.814 0.000
8 24 (W+) -44 5 5 12 12 0 0 -19.744 -26.752 -1604.300 1606.697 81.187
9 1 (d) -43 6 0 13 13 502 0 19.744 26.752 -530.795 531.836 0.330
10 -4 (cbar) -41 18 0 14 6 0 501 0.000 0.000 2.947 2.947 0.000
11 2 (u) -42 19 19 7 7 501 0 0.000 0.000 -2136.814 2136.814 0.000
12 24 (W+) -44 8 8 20 20 0 0 -1.064 -19.028 -1221.827 1224.670 81.187
13 1 (d) -44 9 9 17 17 502 0 27.853 30.104 -681.041 682.274 0.330
14 -4 (cbar) -43 10 0 15 16 0 502 -26.789 -11.076 -231.000 232.816 1.500
15 -4 (cbar) -51 14 0 22 22 0 503 -19.016 -8.750 -120.727 122.537 1.500
16 21 (g) -51 14 0 23 23 503 502 -5.668 -0.052 -161.724 161.823 0.000
17 1 (d) -52 13 13 21 21 502 0 25.748 27.830 -629.590 630.730 0.330
18 -4 (cbar) -41 25 0 24 10 0 504 0.000 0.000 3.067 3.067 0.000
19 2 (u) -42 26 26 11 11 501 0 0.000 0.000 -2136.814 2136.814 0.000
20 24 (W+) -44 12 12 27 27 0 0 -0.639 -17.937 -1205.912 1208.775 81.187
21 1 (d) -44 17 17 28 28 502 0 25.919 28.268 -639.604 640.753 0.330
After 1 event information is printed out, 100 events get processed and finally reports the cross section.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Overall cross-section summary
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process xsec_before [pb] passed nposw nnegw tried nposw nnegw xsec_match [pb] accepted [%] event_eff [%]
0 3.078e+04 +/- 2.327e+02 100 100 0 100 100 0 3.078e+04 +/- 2.327e+02 100.0 +/- 0.0 100.0 +/- 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 3.078e+04 +/- 2.327e+02 100 100 0 100 100 0 3.078e+04 +/- 2.327e+02 100.0 +/- 0.0 100.0 +/- 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Before matching: total cross section = 3.078e+04 +- 2.327e+02 pb
After matching: total cross section = 3.078e+04 +- 2.327e+02 pb
Matching efficiency = 1.0 +/- 0.0 [TO BE USED IN MCM]
Filter efficiency (taking into account weights)= (100) / (100) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (100) / (100) = 1.000e+00 +- 0.000e+00 [TO BE USED IN MCM]
After filter: final cross section = 3.078e+04 +- 2.327e+02 pb
After filter: final fraction of events with negative weights = 0.000e+00 +- 0.000e+00
After filter: final equivalent lumi for 1M events (1/fb) = 3.249e-02 +- 2.478e-04
=============================================
How did the cross section change after parton shower?
MadGraph reported
# original cross-section: 30775.0
, 30775pb. After running parton shower with Pythia8, same cross section 30780pb is kept. Parton shower adds more and more vertices, but why does the cross section remain unchanged?Solution
Parton shower is unitary. Sum of probability to branch (e.g
q -> q g
) and not branch is 1. Hence, the cross sections is determined by the lowest order input (hard process).
Bonus: How did the distribution change?
(2) Jet merging samples
Hard process calculation has advantage in modeling of hard jets and heavy particle decays while parton shower is great for describing collinear and soft emissions. For more realistic and reliable physics modeling of hard jets, for example in W+jet events, MadGraph can be used as below.
generate p p > w+, w+ > ell+ vl @0
add process p p > w+ j, w+ > ell+ vl @1
With such syntaxes, MadGraph produces W+jet process with 0 and 1 hard jet in the event.
If this sample goes through parton shower, as some portion of events (dentoed with @1
) readily involves hard jet, it would be better at describing W+jet process with hard jet.
However consider the event @0
emitting QCD particles from initial state radiation that could possibly form a jet that is hard enough.
Such phase space inherently possesses a problem of double counting as “W+jet with hard jet” event could come from both @0
and @1
.
To mitigate such issues and remove double counting of phase space contributions, jet merging technique is used.
Jet merging is set up with an artificial cut threshold called jet merging scale.
This scale decides whether an event will be accepted or not from both @0
and @1
.
Finally, only accepted events from the two processes will be merged and form one sample.
Very roughly, jet merging scale can be thought as the momentum of a jet.
If a jet in the event is hard enough above the threshold, events from @0
are rejected while only accepting from @1
.
On the other hand, if a jet in the event is not too hard below the threshold, events from @0
are only accepted while rejecting @1
.
How to produce gridpack
How to set the Madgraph (
run card
) and Pythia (fragment
)?Hint
#********************************************************************* # Matching - Warning! ickkw > 1 is still beta #********************************************************************* 0 = ickkw ! 0 no matching, 1 MLM, 2 CKKW matching
This flag tells MadGraph that the LHE files we are going to produce will later be going through jet merging in > > order to avoid double countings.
#********************************************************************* # Jet measure cuts * #********************************************************************* 0 = xqcut ! minimum kt jet measure between partons
When jet merging is turned on,
xqcut
needs to be set which presample the events for efficient jet merging. Remember that some portion of events will be later discarded and never going to be used. So there is no point of producing events that involve jets with too low energy scale at this LHE level since these will likely be removed.generator = cms.EDFilter("Pythia8HadronizerFilter", PythiaParameters = cms.PSet( pythia8CommonSettingsBlock, pythia8CP5SettingsBlock, processParameters = cms.vstring( 'JetMatching:setMad = off', 'JetMatching:scheme = 1', 'JetMatching:merge = on', 'JetMatching:jetAlgorithm = 2', 'JetMatching:etaJetMax = 5.', 'JetMatching:coneRadius = 1.', 'JetMatching:slowJetPower = 1', 'JetMatching:doShowerKt = off', 'JetMatching:qCut = 19.', 'JetMatching:nQmatch = 4', 'JetMatching:nJetMax = 1', 'TimeShower:mMaxGamma = 4.0' ), parameterSets = cms.vstring( 'pythia8CommonSettings', 'pythia8CP5Settings', 'processParameters', ) ),
Bonus: What is the cross section?
Key Points
Pythia8 is the main tool used for parton showering in CMS
Events are not physical if it did not go through parton shower
Jet merging is a technique to avoid double countings of jet phase spaces in ME and PS calculations
3 - Analysis and systematic uncertainties
Overview
Teaching: 10 min
Exercises: 20 minQuestions
How can we use generator information for quick studies?
What are systematic uncertainties coming from theory inputs?
Objectives
Use NanoGEN for exploratory studies and to gain experience with generator related uncertainties (PDF choice, scale, strong coupling constant)
Study some theory related systematic uncertainties
Analysis of NanoGEN
In the following we will explore a bit the content of NanoGEN samples, and how they can be used for doing first studies for a potentially interesting physics analysis. The NanoGEN sample we’ve previously created contains several trees. You can open the file in root to explore the content, e.g.
cd ~/nobackup/cmsdas_2025_gen/CMSSW_12_4_8/src; cmsenv
root -l wplustest.root
Within root, you can use .ls
to list the content.
If you’re interested in the branches contained in a certain tree you can
.ls
Events->GetListOfBranches()->ls()
The NanoGEN sample contains, amongst others, collections of all the generated particles (GenPart), jets with different cone size parameters (GenJet, GenJetAK8), generated missing transverse momentum (GenMET) and generated leptons (GenDressedLepton). In the exercise we will be using GenMET and GenDressedLepton to calculate the transverse mass of the lepton+MET system, MT, which is often used in analysis with leptonically decaying W bosons.
Where do systematic uncertainties come from?
Several choices of parameters and settings in the event generators have an impact on the predicted overall event yield as well as the acceptance and object kinematics. The most prominent examples are the choice of the parton distribution function (PDF), the chosen values of the renormalization and factorization scales and the value of the strong coupling constant.
Estimating systematic uncertainties
Several different PDF sets are usually stored in samples that are centrally produced in CMS.
In order to save space, only one PDF set is stored in NanoAOD and NanoGEN.
For the UL campaign, the NNPDF3.1 NNLO PDF set is used (LHAPDF), which contains 103 per-event weights.
The stored weights called LHEPdfWeight
are ratios w.r.t. the central value, therefore, the weight with index 0 should usually be 1 (if not it means that different PDFs were used for the central ME and the computed variations and extra care needs to be taken).
Indices 1-100 are different, linear independent PDF sets that can be used to estimate systematic uncertainties, while indices 101 and 102 correspond to the up and down variations of the strong coupling constant.
Hessian and MC replicas
Two different approaches for PDF sets exist: MC replicas and hessian eigenvectors. While the hessian sets used in CMS in (most) UL samples allows for estimation of the total uncertainty by using the squared sum of the individual variations, the situation is more ambigous for MC replicas.
Similar sets of variations are available for the renormalization and factorization scales, LHEScaleWeight
.
You can find the details about the variations in the branch documentation within the root file (see above on how the get the list of branches in root).
Optional (MadSpin and BSM UFO model)
Using MadSpin
Why is MadSpin in any case useful? The answer lies in NLO calculations in QCD or loop-induced processes. Let’s launch MadGraph prompt shell again.
cd ~/nobackup/cmsdas_2025_gen/MG5_aMC_v3_5_2/
./bin/mg5_aMC
Now try making another simple example that is top pair production.
import model sm
generate p p > t t~ [QCD]
It would be not so difficult to realize [QCD]
has been added in the process definition.
This is a flag which tells MadGraph that you wish to do the calculations at NLO in QCD.
Before going further, try concatenating top decays into a W boson and a b quark similar to what we did for Z -> ee
example.
generate p p > t t~, t > w+ b [QCD]
generate p p > t t~ [QCD], t > w+ b
exit
You will find neither of these working and instead MadGraph complains with an error log saying str : Decay processes cannot be perturbed
.
So it means that physics processes with decays of particles are are not possible for NLO calculations.
This is where MadSpin becomes necessary, for such cases where resonant particle cannot be decayed can be decayed using MadSpin.
Now lets get back to the working example to see how it works.
import model sm
generate p p > t t~ [QCD]
output TopPair
launch
shower = PYTHIA8
4
0
Two lines are noticably added, shower = PYTHIA8
and 4
(which can be replaced with madspin = ON
).
We are again not going to do the parton shower here.
This is because depending on which parton shower generator one chooses later, “counter term” calculation differs which accounts as negatively weighted events.
Negative weighted events
We won’t cover what it is in the tutorial but important things to remember are that
- Some portion of the events are negatively weighted so one needs to be careful with the normalization.
- LHE files at NLO are even more unphysical than LHE files at LO before parton shower.
Press tab
to turn off timer.
MadGraph again asks if you would like to edit the cards now including madspin_card.dat
.
/------------------------------------------------------------\
| 1. param : param_card.dat |
| 2. run : run_card.dat |
| 3. madspin : madspin_card.dat |
\------------------------------------------------------------/
If you take a look at the run_card.dat
, you might notice that the template for it is quite different from when we did DY at LO.
Template for NLO is shown in (link)[https://github.com/cms-PdmV/GridpackFiles/blob/master/Campaigns/Run3Summer22/MadGraph5_aMCatNLO/Templates/NLO_run_card.dat] and for LO is shown in (link)[https://github.com/cms-PdmV/GridpackFiles/blob/master/Campaigns/Run3Summer22/MadGraph5_aMCatNLO/Templates/LO_run_card.dat].
Although MadGraph shares the same user interface, LO and NLO calculations run on totally different codes in the backend.
So NLO type run_card.dat
does not work for LO calculations and vice versa.
Now take a look at madspin_card.dat
by pressing 3
.
# specify the decay for the final state particles
decay t > w+ b, w+ > all all
decay t~ > w- b~, w- > all all
decay w+ > all all
decay w- > all all
decay z > all all
This card lets you define how you want your resonant particles to decay. For example, if you do :
decay t > w+ b, w+ > e+ ve
decay t~ > w- b~, w- > mu- vm~
This forces top to decay into positron and antitop to decay into muon.
Remove unnecessary decay definitions and add these two lines to make a top pair sample that ends up giving you positron and a muon.
Before moving on, do set run_card nevents 50
to save time, producing only 50 events.
You will see inclusive top pair production cross section being computed which includes all possible decays for the top quark.
--------------------------------------------------------------
Summary:
Process p p > t t~ [QCD]
Run at p-p collider (6500.0 + 6500.0 GeV)
Number of events generated: 50
Total cross section: 6.847e+02 +- 4.3e+00 pb
--------------------------------------------------------------
And then you will see MadSpin doing its job, decaying the top quarks to desired channels.
************************************************************
* *
* W E L C O M E to M A D S P I N *
* *
************************************************************
...
INFO: decay channels for t : ( width = 1.4915 GeV )
INFO: BR d1 d2
INFO: 1.000000e+00 b w+
INFO:
INFO:
INFO: decay channels for w+ : ( width = 2.04793 GeV )
INFO: BR d1 d2
INFO: 3.333610e-01 d~ u
INFO: 3.333610e-01 s~ c
INFO: 1.111195e-01 e+ ve
INFO: 1.111195e-01 mu+ vm
INFO: 1.110390e-01 ta+ vt
INFO:
INFO:
INFO: decay channels for t~ : ( width = 1.4915 GeV )
INFO: BR d1 d2
INFO: 1.000000e+00 b~ w-
INFO:
INFO:
INFO: decay channels for w- : ( width = 2.04793 GeV )
INFO: BR d1 d2
INFO: 3.333610e-01 d u~
INFO: 3.333610e-01 s c~
INFO: 1.111195e-01 e- ve~
INFO: 1.111195e-01 mu- vm~
INFO: 1.110390e-01 ta- vt~
...
INFO: Estimating the maximum weight
INFO: *****************************
INFO: Probing the first 75 events
INFO: with 400 phase space points
INFO:
INFO: Event 1/75 : 0.068s
INFO: Event 6/75 : 0.63s
INFO: Event 11/75 : 1.2s
INFO: Event 16/75 : 1.8s
INFO: Event 21/75 : 2.1s
INFO: Event 26/75 : 3s
INFO: Event 31/75 : 3.8s
INFO: Event 36/75 : 4.6s
INFO: Event 41/75 : 5.7s
INFO: Event 46/75 : 6.5s
What is the cross section?
Inclusive cross section was reported to be 684.7pb as we saw above. When considering the decay channels (
e+
andmu-
final states), what is the proper cross section? What are the branching ratios forw+ > e+ ve
andw- > mu- vm~
?Solution
8.5pb (from 684.7 x 11% x 11%)
How can we make a sample that yields
mu+
,vm
, and this time, two quark jets (hadronically decayingw-
)Solution
decay t > w+ b, w+ > mu+ vm decay t~ > w- b, w- > j j
Interfacing BSM UFO model files
Let’s take a look at how BSM samples for search type of analyses gets produced. We will pick one simple example, a hypothetical heavy gauge boson that is called W’ particle.
import model WEff_UFO
display particles
generate p p > wp+, wp+ > e+ ve
add process p p > wp-, wp- > e- ve~
output WprimeToENu
How can we make the syntax simpler using particle containers?
How can we write
generate p p > wp+, wp+ > e+ ve
andadd process p p > wp-, wp- > e- ve~
in a simpler way?Solution
define wprime = wp+ wp- define leptons = e+ e- ve ve~ generate p p > wprime, wprime > leptons leptons
This will find all possible Feynman diagrams with given particle combinations.
As we are missing right-handed interactions for W bosons in the SM, a lot of BSM scenarios predict the W’ boson that is heavier in mass (thus, we couldn’t find it yet) but possesses the ability to interact with right-handed couplings. As we do not know how large the particle’s mass is, we test many different scenarios (BSM parameters), for example, different masses, decay channels, coupling strengths. We will now see how such BSM parameters can be set in MadGraph.
launch
0
And press tab
to turn off the timer.
Take a look at the parameter card by hitting 1
.
Now you will see there is a clear difference in the parameter settings when compared to the sm
model file we’ve been using.
Here, we will only be focusing on the mass of W’ MWp
and the right-handed coupling strength kR
.
In addition, you will also need to keep in mind that widths of the W’ wwp
should be changing based on how you choose your BSM parameters.
###################################
## INFORMATION FOR MASS
###################################
Block mass
1 5.040000e-03 # MD
2 2.550000e-03 # MU
3 1.010000e-01 # MS
4 1.270000e+00 # MC
5 4.700000e+00 # MB
6 1.720000e+02 # MT
11 5.110000e-04 # Me
13 1.056600e-01 # MMU
15 1.777000e+00 # MTA
23 9.118760e+01 # MZ
25 1.250000e+02 # MH
34 1.000000e+03 # MWp
...
###################################
## INFORMATION FOR WPCOUP
###################################
Block wpcoup
1 0.000000e+00 # kL
2 1.000000e+00 # kR
...
###################################
## INFORMATION FOR DECAY
###################################
DECAY 6 1.508336e+00 # WT
DECAY 23 2.495200e+00 # WZ
DECAY 24 2.085000e+00 # WW
DECAY 25 4.070000e-03 # WH
DECAY 34 1.000000e+01 # WWp
You can see that the mass of W’ is now set to 1000GeV, right-handed coupling strength is set to 1.0, and the width of W’ is given with 10GeV. You can change the BSM parameters, maybe mass to 2000GeV and coupling strength to 0.1 by doing below.
set param_card mwp 2000
set param_card kr 0.1
However, if you again take a look at the parameter card, the width of W’ wwp
is kept same.
You can interactively see how the width value gets computed by doing compute_widths wp+
.
Check the parameter card again, and you would see that width has changed and also tells you the branching ratios to different channels.
# PDG Width
DECAY 34 6.672601e-01
# BR NDA ID1 ID2 ...
2.506959e-01 2 2 -1 # 0.1672793598579319
2.479126e-01 2 6 -5 # 0.16542221070326676
2.379169e-01 2 4 -3 # 0.15875247762227632
8.356529e-02 2 12 -11 # 0.05575978661997229
8.356529e-02 2 14 -13 # 0.05575978638653866
8.356519e-02 2 16 -15 # 0.05575972059211703
1.277883e-02 2 2 -3 # 0.008526805639865994
Instead of doing interactive width computation, you can do set param_card wwp auto
.
Then instead of first computing the widths, MadGraph will calculate the widths on-the-fly while generating events (but results will be identical).
Proceed by hitting 0
and see how much cross section it gives you when hypothetically the W’ boson exists and decays to the electron channel, assuming mass 2000GeV with right handed coupling 0.1.
=== Results Summary for run: run_01 tag: tag_1 ===
Cross-section : 0.001016 +- 1.447e-06 pb
Nb of events : 10000
How can we check the cross section when mass is 2000GeV with right handed coupling 1.0?
Solution
Repeat the exercise above but this time
set param_card mwp 2000 set param_card kr 1.0
And most importantly, do not forget to compute the width by adding :
set param_card wwp auto
Then you will get the following result.
=== Results Summary for run: run_01 tag: tag_1 === Cross-section : 0.1045 +- 0.0001681 pb Nb of events : 10000
How much did the cross section increase compared to the scenario when mass is 2000GeV with right handed coupling 0.1?
How many interactions did the W’ boson get involved in?
Solution
One vertex when producing it, another vertex when it decays to electron channel. Thus two interactions (1./0.1) = 10 gets squared and thus result in 100 times larger cross section.
Key Points
PDF uncertainties can be estimated from a set of different per-event weights. The method depends on the type of PDF set that is used (hessian, MC replicas)
Scale and alphaS variations are another source of uncertainty in the prediction of a simulated sample and can be used to estimate systematic uncertainties
4 - CMS resources for samples and generators
Overview
Teaching: 5 min
Exercises: 10 minQuestions
How do I find centrally produced samples and their status?
How do I obtain a cross section to normalize my sample?
Objectives
Leverage available tools for efficient analysis work
CMS resources for simulated samples
How to find samples and related information
Get configurations for a certain sample from McM. E.g. you want the inclusive W+jets sample, start from a DAS query (requires a valid grid certificate / proxy):
dasgoclient -query="/WJetsToLNu*/RunIISummer20UL18*/MINIAODSIM"
Recall
Instructions to set up and verify your grid certificate have been covered in pre-exercises.
Alternatively there’s also a web-based DAS client: https://cmsweb.cern.ch/das/, use dataset=/WJetsToLNu*/RunIISummer20UL18*/MINIAODSIM
to perform your search.
/WJetsToLNu_012JetsNLO_34JetsLO_EWNLOcorr_13TeV-sherpa/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v4/MINIAODSIM
/WJetsToLNu_012JetsNLO_34JetsLO_EWNLOcorr_13TeV-sherpa/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1_ext1-v2/MINIAODSIM
/WJetsToLNu_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v2/MINIAODSIM
/WJetsToLNu_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_2J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_2J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-100To200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-100To200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-200To400_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-200To400_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-70To100_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-70To100_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-100To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-100To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-400To600_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-400To600_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-600ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-600ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-PUForTRK_TRK_106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-PUForTRKv2_TRKv2_106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-100to200_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-200toInf_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-200toInf_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
We want the inclusive LO sample with the latest MiniAOD version (MiniAODv2), hence we pick /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
.
Plug this name into ‘‘Output Dataset’’ in McM, then click on the dataset name (WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8
).
In ‘‘Select View’’ check ‘‘Fragment’’ and click on the expand icon under ‘‘Fragment’’ (rightmost column) for the request with a Summer20UL18wmLHEGS PrepId.
You can also filter the results directly by appending ?dataset_name=WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8&prepid=*Summer20UL18wmLHE*
to the requests address, https://cms-pdmv.cern.ch/mcm/requests
.
Status of samples
GrASP is tool to conveniently track the status of your samples. Just select the campaigns you’re interested in (e.g. Run2 UL or Run3) and type the sample name. You can also tag samples of your analysis so that they are easier to find and keep track of.
Cross sections
CMSSW analyzer
In the following, we will use a CMSSW analyzer called GenXSecAnalyzer to compute the cross section of samples. The analyzer takes a list of EDM files as input (i.e., no NanoAOD or NanoGEN). Make sure you are in a CMSSW environment
cd ~/nobackup/cmsdas_2025_gen/CMSSW_12_4_8/src/
cmsenv
You can then use the prepared configurations to obtain the cross section for a sample of your liking, e.g. /TTWJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-madspin-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
Open your favorite editor and create a new file xsec_ana.py
:
import FWCore.ParameterSet.Config as cms
from FWCore.ParameterSet.VarParsing import VarParsing
options = VarParsing ('analysis')
options.parseArguments()
process = cms.Process('ANA')
# import of standard configurations
process.load('Configuration.StandardSequences.Services_cff')
process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi')
process.load('FWCore.MessageService.MessageLogger_cfi')
process.load('Configuration.EventContent.EventContent_cff')
process.load('SimGeneral.MixingModule.mixNoPU_cfi')
process.load('Configuration.StandardSequences.GeometryRecoDB_cff')
process.load('Configuration.StandardSequences.MagneticField_38T_cff')
process.load('Configuration.StandardSequences.Generator_cff')
process.load('IOMC.EventVertexGenerators.VtxSmearedRealistic8TeVCollision_cfi')
process.load('GeneratorInterface.Core.genFilterSummary_cff')
process.load('Configuration.StandardSequences.EndOfProcess_cff')
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')
from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:mc', '')
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(options.maxEvents)
)
process.MessageLogger.cerr.FwkReport.reportEvery = 10000
# Handle different input options
inputFiles=[]
if len(options.inputFiles)==1 and (".root" not in options.inputFiles[0]):
flist = open(options.inputFiles[0])
inputFiles = flist.readlines()
flist.close()
else:
inputFiles = options.inputFiles
process.source = cms.Source(
"PoolSource",
fileNames = cms.untracked.vstring(inputFiles),
duplicateCheckMode = cms.untracked.string('noDuplicateCheck')
)
process.dummy2 = cms.EDAnalyzer("GenXSecAnalyzer")
# Path and EndPath definitions
process.ana = cms.Path(process.dummy2)
# Schedule definition
process.schedule = cms.Schedule(process.ana)
Run the configuration file with:
dasgoclient -query="file dataset=/TTWJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-madspin-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM | grep file.name" > myfiles.txt
cmsRun xsec_ana.py inputFiles_load=myfiles.txt maxEvents=100000
In this example we restrict the maximum number of events to 100k.
This will give us a large enough sample for a reliable result, without running too long (the sample has 10.5M events, you can use DAS to verify this number with dasgoclient -query="summary dataset=DATASET
).
The inputFiles
option takes a range of options:
- A single file in your local area:
inputFiles="file:mylocalfile.root"
- A single published file:
inputFiles="/store/mc/..."
- Multiple files:
inputFiles="/store/mc/file1,/store/mc/file2"
- A text file containing one filepath per line:
inputFiles_load="myfilelist.txt"
Questions:
- What is the total cross section for your chosen sample? What is the relative uncertainty in this cross section that you obtained?
- Are there different processes listed in the summary? What could those different processes be?
- Does your sample have negative weights? If yes, what is the fraction of events with negative weights?
- The printout also mentions the equivalent luminosity. Do you understand what is meant by that?
xsec DB
A central database is kept with approved x-secs for centrally produced samples, XSDB.
The CMS Generator’s group Cross Section Database Tool (XSDB) is a tool for storing and looking up information related to a specific MC sample witihin CMS. This tool is designed to complement DAS and MCM, with a direct link from DAS being available to a specific sample. Anyone with a CERN login can view the XSDB and perform queries for sample information. However, further action is restricted by e-group permissions. There exist a user’s, approver’s, and admin e-groups. The XSB users are CMS individuals that have permission to insert and modify documents for XSDB. Approvers have the same user privileges as users, but are primarily tasked with approving records submitted by users. The admins have the responsibility of maintaining and improving the tool for future use.
There is a large amount of information that can be stored in the database for each sample. This information includes: cross section value, cross section uncertainty, hadronization tool, matrix element generator, sample contact, cuts used, DASprimary dataset name, and MCM prename, among other metadata. This information can then be used to help with analysis. In this exercise, we will simply try some searching through XSDB for a sample, looking at some information stored there and getting familiar with the XSDB.
We would like to search for a sample within XSDB. We’ll look for an EXO sample used in the Contact Interaction qqbar to dimuon channel in the search for compositeness.
The sample can be found in DAS with the dataset name: /CITo2Mu_M1300_CUETP8M1_Lam10TeVDesRR_13TeV-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
- Query XSDB using:
DAS=CITo2Mu_M1300_CUETP8M1_Lam10TeVDesRR_13TeV-pythia8
in the query field and hitting either enter button on keyboard or clicking “Search” - Take a minute to explore the items stored for the sample
- You can also choose which metadata are displayed by checking, or unchecking the appropriate boxes in between the search bar and the displayed results
- If we would like to see all of the Contact Interaction samples available we can search:
process_name=cito2mu*
- Take some time to look through the samples and pagination at the bottom of the results page
- Repeat this exercise for
process_name=ttbar
. This will show a typical search for SM background samples.
It is possible to search for a substring of the item that one would like to look for.
It is important to note that wildcards are supported, however as long as the string is contiguous, it will be accepted by the XSDB query.
XSDB also supports boolean queries.
If we want to query the database for our original sample we could use the following: process_name=cito2mu && total_uncertainty=21.42
You can also query for your favorite MC sample.
The XSDB twiki can be found here: XSDB twiki.
Key Points
DAS can be used to find samples and their files, number of events for a certain sample etc
McM is used for sample generation management, and can be used by the user to obtain additional information about their samples, e.g. the root gridpack, fragments etc.
McM is also a good place to look for example cmsDriver commands
Different sources for x-secs exist within CMS: a CMSSW analyzer and a database