1 - Matrix Element Generator
Overview
Teaching: 20 min
Exercises: 40 minQuestions
What are Monte Carlo Generators?
Why are we using simulated samples in CMS?
How are simulated samples created in CMS?
Objectives
Use the MadGraph generator in standalone mode and get familiar with the basic syntax
Analyze the produced LHE files
Introduction and first steps
Although quite old, link is a great reading material to get a general overview of Monte Carlo event generators. Monte Carlo event generators are essential components of almost all experimental analyses and are also widely used by theorists and experiments to make predictions and preparations for future experiments. It is one of the topics where we CMS experimentalists and theorists have the closest connections to, theorists give us predictions and experimentalists verify them with the actual data. Although Monte Carlo event generators are extremely important tools in HEP, they are often used as black boxes which we more or less treat them as “data”. Our aim is to get the minimal background of how these tools are working and analyze them using the generator level information.
Samples that are used by CMS experiments go through several steps of simulation :
- Monte Carlo event generator
- Detector simulation
- Pileup mixing
- Trigger emulation
- Object econstruction
We focus on “1. Monte Carlo event generator” in this tutorial. Monte Carlo event generator can be further divided into several subpieces as each steps can be factorized and can be handled through separate calculations :
- Parton distribution function (PDF)
- Hard scattering (matrix element calculation)
- Parton shower & hadronization First of all, LHC is a proton-proton collider, hence we need information on how partons (quarks and gluons) are distributed in the proton (PDF). Hard scattering is the part where calculations can be treated perturbatively, interactions of incoming partons with the largest momentum transfer (usually the physics process we are interested in). Parton shower & hadronization further describes how the particles involed in the hard scattering evolve, working downwards to lower momentum scales even to a point where perturbative calculations break down.
Using Standalone Madgraph
In the first part of the exercise, we will use the matrix element generator MadGraph5 _aMC@NLO, or in short MadGraph link.
MadGraph can perform the calculations for many different physics processes (both SM and BSM) at leading and next-to-leading order (LO & NLO) in QCD.
Because of its easy user interface and flexibility with UFO models, you can test wide variety of physics modeling.
We will now first see how MadGraph runs interactively in standalone mode using simple W+
(wplus) process as an example.
We will first use the interactive prompt of MadGraph to generate proton proton collision events that produce W bosons.
First, log in to a new session on the LPC cluster (ssh -Y <USERNAME>@cmslpc-el8.fnal.gov
).
Make sure you have completed the setup steps!
Then, start the interactive prompt of Madgraph:
cd ~/nobackup/cmsdas_2025_gen/MG5_aMC_v3_5_2/
./bin/mg5_aMC
Madgraph is configured and steered through text-based cards.
The process definitions can be stored in a card called proc_card.dat
.
You can look at an example using the following command:
!cat wplustest_4f_LO_proc_card.dat
Note: the exclamation mark is used to execute shell commands within Madgraph, e.g. !cat
in the above example.
import model sm-ckm
#switch to diagonal ckm matrix if relevant for speed
#import model sm-lepton_masses
define ell+ = e+ mu+ ta+
define ell- = e- mu- ta-
generate p p > w+, w+ > ell+ vl @0
output wplustest_4f_LO -nojpeg
Copy/paste the commands line-by-line and pay attention to the output.
The two most important lines of this block are the model import (import model sm-ckm
) and the instructions on the process to generate (generate p p > w+, w+ > ell+ vl
).
Within the MG directory you can find a directory models
, that contains different pre-installed models.
The most obvious one that we are using in the example is sm
- the standard model at leading order in perturbative QCD.
Model parameters can be configured through ‘‘restriction cards’’, in this example restrict_ckm.dat
loaded through the syntax sm-ckm
.
This specific restriction card uses a non-diagonal CKM matrix (diagonal CKM is the default otherwise for simplification and faster running).
One great feature of MadGraph is it’s flexibility in terms of physics models to use.
To generate a sample using a new physics model one can use the UFO interface. A database of models can be found in the feynrules model database.
The practically most relevant part is that MG figures out all relevant Feynman diagrams contributing to a process.
If you are trying to set up a new MC sample, looking at these Feynman diagrams is a great way to check that you actually get the physics you want.
To check them out you can open the individual plots in e.g. wplustest_4f_LO/SubProcesses/P0_qq_wp_wp_lvl/matrix*.ps
with gv
, display
or evince
.
You can also use the ps2pdf
program to convert the post script files into PDFs.
Alternatively, remove -nojpeg
from the output line and look at the diagrams in jpeg format using display
.
Now that Madgraph has figured out the feynman diagrams you can start the actual computation within the MG5 prompt with
launch
Hint: if you closed the interactive MG session for some reason you can still launch without rerunning the previous commands with
./bin/mg5_aMC
launch wplustest_4f_LO
Madgraph will ask you a few more questions. Press tab
to turn off the timer (otherwise, MadGraph will move on by itself after 60 seconds).
/===========================================================================\
| 1. Choose the shower/hadronization program shower = Not Avail. |
| 2. Choose the detector simulation program detector = Not Avail. |
| 3. Choose an analysis package (plot/convert) analysis = Not Avail. |
| 4. Decay onshell particles madspin = OFF |
| 5. Add weights to events for new hypp. reweight = Not Avail. |
\===========================================================================/
The first one you can just skip by pressing <RETURN>. As we did not install any other shower
, detector
, analysis package
, they are in Not Avail.
state.
Do you want to edit a card (press enter to bypass editing)?
/------------------------------------------------------------\
| 1. param : param_card.dat |
| 2. run : run_card.dat |
\------------------------------------------------------------/
you can also
- enter the path to a valid card or banner.
- use the 'set' command to modify a parameter directly.
The set option works only for param_card and run_card.
Type 'help set' for more information on this command.
- call an external program (ASperGE/MadWidth/...).
Type 'help' for the list of available command
[0, done, 1, param, 2, run, enter path][90s to answer]
Let’s take a look at the param card
and see how the values are set, press 1
and ENTER
(<RETURN>) to investigate the parameter settings.
###################################
## INFORMATION FOR MASS
###################################
Block mass
5 4.700000e+00 # MB
6 1.730000e+02 # MT
15 1.777000e+00 # MTA
23 9.118800e+01 # MZ
25 1.250000e+02 # MH
...
###################################
## INFORMATION FOR DECAY
###################################
DECAY 6 1.491500e+00 # WT
DECAY 23 2.441404e+00 # WZ
DECAY 24 2.047600e+00 # WW
DECAY 25 6.382339e-03 # WH
Let’s take a look at the run card
and see how the values are set, press 2
and ENTER
(<RETURN>) to investigate the run settings.
#*********************************************************************
# Number of events and rnd seed *
# Warning: Do not generate more than 1M events in a single run *
#*********************************************************************
10000 = nevents ! Number of unweighted events requested
0 = iseed ! rnd seed (0=assigned automatically=default))
...
#*********************************************************************
# Collider type and energy *
# lpp: 0=No PDF, 1=proton, -1=antiproton, *
# 2=elastic photon of proton/ion beam *
# +/-3=PDF of electron/positron beam *
# +/-4=PDF of muon/antimuon beam *
#*********************************************************************
1 = lpp1 ! beam 1 type
1 = lpp2 ! beam 2 type
6500.0 = ebeam1 ! beam 1 total energy in GeV
6500.0 = ebeam2 ! beam 2 total energy in GeV
...
#*********************************************************************
# Standard Cuts *
#*********************************************************************
# Minimum and maximum pt's (for max, -1 means no cut) *
#*********************************************************************
10.0 = ptl ! minimum pt for the charged leptons
-1.0 = ptlmax ! maximum pt for the charged leptons
{} = pt_min_pdg ! pt cut for other particles (use pdg code). Applied on particle and anti-particle
{} = pt_max_pdg ! pt cut for other particles (syntax e.g. {6: 100, 25: 50})
...
#*********************************************************************
# Minimum and maximum invariant mass for pairs *
#*********************************************************************
0.0 = mmll ! min invariant mass of l+l- (same flavour) lepton pair
-1.0 = mmllmax ! max invariant mass of l+l- (same flavour) lepton pair
{} = mxx_min_pdg ! min invariant mass of a pair of particles X/X~ (e.g. {6:250})
{'default': False} = mxx_only_part_antipart ! if True the invariant mass is applied only
! to pairs of particle/antiparticle and not to pairs of the same pdg codes.
...
#*********************************************************************
# maximal pdg code for quark to be considered as a light jet *
# (otherwise b cuts are applied) *
#*********************************************************************
4 = maxjetflavor ! Maximum jet pdg code
Try editting the beam energy (ebeam1
and ebeam2
) 6500
to 6800
as we are now running at 13.6TeV beam energy.
When done with editting, escape after saving the changes in the text file.
MadGraph allows you to change settings by interactively typing in below as well.
set run_card nevents 5000
Take a look at the run card again and see if number of events to generate (nevents
) is changed to 5000
.
And change it back to 10000
using same command and check again.
As shown above, there are several phase space cuts set by default (e.g. 10.0 = ptl
).
There is a handy command that removes all phase space cuts at once (instead of doing set run_card ptl 0
, set run_card ptj 0
, … one by one by hand).
set no_parton_cut
Take a look at the card again and see if lepton pt cut (ptl
) is changed to 0
.
Keep in mind that the cuts you give before doing set no_parton_cut
will be removed by this command.
So don’t forget to do set no_parton_cut
before giving the cuts you wish to give.
Once you are done, please provide the path to the pre-made run_card: wplustest_4f_LO_run_card.dat
What is the cross section determined by Madgraph?
Obtaining the cross section
Define a process (e.g. from the process card above) and launch.
Solution
launch wplustest_4f_LO
=== Results Summary for run: run_01 tag: tag_1 === Cross-section : 2.715e+04 +- 39.45 pb Nb of events : 10000 INFO: No version of lhapdf. Can not run systematics computation store_events INFO: Storing parton level results INFO: End Parton reweight -from_cards decay_events -from_cards INFO: storing files of previous run INFO: Done
The cross section calculated by MG is
2.715e+04 +- 39.45 pb
.
The main output that MG produces is called ‘‘LHE file’’. The LHE file (Les Houches Event file) is a standard file format that stores process and event information from parton-level event generators. The documentation can be found here.
In general, the LHE file contains a header with description of the settings of the generator (e.g. process and run information), and multiple event blocks (one for each event). The LHE file is plain text, so it’s usually a good idea to use some compression algorithm to save space - MG zips the output by default.
Looking at the LHE output
Find the LHE file produced by MG and find the first event block.
Example solution
Exit MG, then do
find -path './wplustest_4f_LO/*.lhe.gz' gzip -d ./wplustest_4f_LO/Events/run_01/unweighted_events.lhe.gz less ./wplustest_4f_LO/Events/run_01/unweighted_events.lhe
<LesHouchesEvents version="3.0"> <header> ... </header> ... <event> 5 0 +2.7145900e+04 7.93095700e+01 7.54677100e-03 1.33102200e-01 2 -1 0 0 501 0 +0.0000000000e+00 +0.0000000000e+00 +4.5829549845e+01 4.5829549845e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00 -1 -1 0 0 0 501 -0.0000000000e+00 -0.0000000000e+00 -3.4311969734e+01 3.4311969734e+01 0.0000000000e+00 0.0000e+00 1.0000e+00 24 2 1 2 0 0 +0.0000000000e+00 +0.0000000000e+00 +1.1517580112e+01 8.0141519579e+01 7.9309573878e+01 0.0000e+00 0.0000e+00 -13 1 3 3 0 0 -1.6845086581e+01 +2.2368564620e+01 -2.2614075432e+01 3.5993138689e+01 0.0000000000e+00 0.0000e+00 1.0000e+00 14 1 3 3 0 0 +1.6845086581e+01 -2.2368564620e+01 +3.4131655543e+01 4.4148380890e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00 <mgrwt> <rscale> 0 0.79309574E+02</rscale> <asrwt>0</asrwt> <pdfrwt beam="1"> 1 2 0.70507000E-02 0.79309574E+02</pdfrwt> <pdfrwt beam="2"> 1 -1 0.52787646E-02 0.79309574E+02</pdfrwt> <totfact> 0.15019241E+05</totfact> </mgrwt> <rwgt> <wgt id='1'> +2.3707085e+04 </wgt> ... </rwgt> </event>
What does each column mean?
Solution
ID
,status
,mother1
,mother2
,color
,anticolor
,px
,py
,pz
,E
,mass
,life time
, andspin
-11 1 3 3 0 0 -2.3393803385e+01 -7.4187481776e+00 -1.5274153214e+02 1.5470062541e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
This line tells you that a positron (
ID
) is an outgoing particle (status
) with Z as its mother (mother1
andmother2
: 3rd particle is Z which isID=23
) with no color (color
andanticolor
), …
MadGraph syntax
If you want to add another process, e.g. production of W- in the above example, you can add another process with
add process p p > w-, w- > ell- vl~
A detailed introduction to the syntax is given in this documentation. Some very basic things to keep in mind:
generate p p > e+ e-
will generate any diagram that is compatible with the used model that produces an electron / positron pair
generate p p > e+ e- / Z
will exclude diagrams that contain a Z boson as internal paricle
generate p p > e+ e- $ Z
will exclude the Z boson from appearing in the s-channel (careful about gauge invariance)
generate p p > Z > e+ e-
will always include a Z boson in the s-channel (careful about gauge invariance)
generate p p > w+ QED=3
will include all QED contributions, otherwise the QED order is always set to its minimal value
Bonus: Obtaining the cross section of W boson production
The exercise above only contained W+ bosons (only positive charge). Add production of the negatively charged W bosons and calculate the cross section. Before running MG, think about what result you would expect, i.e. by how much do you think the cross section should increase. Compare the results of W+ and W+/-. What do you conclude?
Solution
import model sm-ckm define ell+ = e+ mu+ ta+ define ell- = e- mu- ta- generate p p > w+, w+ > ell+ vl @0 add process p p > w-, w- > ell- vl~ @1 output wtest_4f_LO -nojpeg launch
When prompted about the run_card, use
wplustest_4f_LO_run_card.dat
again.=== Results Summary for run: run_01 tag: tag_1 === Cross-section : 4.667e+04 +- 63.91 pb Nb of events : 10000
The cross section calculated by Madgraph is
4.732e+04 +- 57.08 pb
. While one would naiively expect the cross section to double by including W- bosons we only get a cross section that is ~40% larger. The simplified explanation is that the initial state protons contain more up valence quarks than down valence quarks.
Using the gridpack workflow
As mentioned previously, interactive running of MG5 is useful for developments and quick tests, but ultimately not practical for large-scale production. To avoid having to use the interactive mode, one can make use the card structure of MG. A fully automated workflow for running MG and producing gridpacks is maintained in the genproductions repository. A gridpack is simply an archive file that contains all the executable MG5 code needed to produce LHE events for a given process, which can then be executed easily on many different grid workers (hence the name). It has the advantage that once it is created, it is a one-button program to generate events, no thinking required. In this part of the exercise, we will use the same input cards as before to create a gridpack, run it, and compare the results to before.
Gridpacks are generated using the gridpack_generation script, which we will run in local mode, i.e. on the machine we are currently logged in to. Note that scripts are provided to run the gridpacks on other computing infrastructures such as the CERN batch system and CMSConnect, which is useful for more complicated processes.
To create a gridpack, we simply call gridpack_generation.sh and pass the process name and card location to it.
Setting and unsetting the CMSSW environment
You should not have activated a CMSSW environment in this exercise so far. However, if you did so before, you need to unset it in order to not interfere with the genproductions script. You can run the following command to unset the CMSSW environment, or log in to a clean new session.
eval `scram unsetenv -sh`
We will be generating a gridpack with cards similar to the commands we’ve used in the standalone example above. The cards are located in the MG section of the genproductions directory
cd ~/nobackup/cmsdas_2025_gen/genproductions_mg352/bin/MadGraph5_aMCatNLO
time ./gridpack_generation.sh wplustest_4f_LO cards/examples/wplustest_4f_LO local
Naming conventions
There are certain naming conventions for the input cards. For a given process name $NAME, the input cards must be named as $NAME_run_card.dat, $NAME_proc_card.dat, etc…
mkdir work
cd work
tar xf ../wplustest_4f_LO_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
NEVENTS=10000
RANDOMSEED=12345
NCPU=1
./runcmsgrid.sh $NEVENTS $RANDOMSEED $NCPU
This will produce an output LHE file called cmsgrid_final.lhe
.
Comparing the LHE output
There are multiple ways of analyzing an LHE file, each of which has its own advantages and disadvantages. For the purpose of this exercise, we will use a pre-made pyroot script.
cd ~/nobackup/cmsdas_2025_gen/
cp /eos/uscms/store/user/cmsdas/2025/short_exercises/generators/LHEReader.py .
Convert LHE output to root format
Make sure the Python virtual environment is deactivated
If you are still using the virtual environment, you need to unset it in order to not interfere with the CMSSW environment.
cd CMSSW_12_4_8/src; cmsenv; cd -
python3.9 LHEReader.py --input MG5_aMC_v3_5_2/wplustest_4f_LO/Events/run_01/unweighted_events.lhe --output standalone.root
python3.9 LHEReader.py --input genproductions_mg352/bin/MadGraph5_aMCatNLO/work/cmsgrid_final.lhe --output cmsgrid.root
Feel free to experiment here and plot various quantities. What are the shapes of the lepton pT distributions? What is the shape of the pT distribution of the W system? Are these shapes physical?
Key Points
MadGraph is a widely used tool to generate matrix-element predictions for the hard scatter for SM and BSM processes.
Standalone MadGraph can run interactively on-the-fly or by importing the predefined text scripts
Gridpacks are used for large scale productions with consistency guaranteed
LHE level information is not physical and parton shower is needed to describe full physics