2 - Parton Shower Generator

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • Why do we need to do parton showering?

  • How are simulated samples created in CMS?

Objectives
  • Perform parton shower with LHE file as an input

  • Perform parton shower with gridpack as an input

  • Analyze generator level information using NanoGEN files

Creating particle level samples from LHE files

As discussed earlier, LHE files itself are not enough to describe physical distributions. In order to generate physics-wise sensible events, LHE files need to go through the parton shower. Parton shower, in principle, is responsible for higher order corrections to the hard process. Dominant contributions of such correction happen with collinear or soft emissions. In CMS, one of the most widely used tool for parton shower is Pythia8 (however, do note that Pythia8 is a multipurpose generator that is able to calculate hard process for certain physics processes). In this exercise, instead of compiling Pythia8 and running it in standalone mode as we did for MadGraph, we will take Pythia8 that is already compiled under CMSSW environment.

(1) Running Pythia8 interface in CMSSW

Make sure the Python virtual environment is deactivated

If you are still using the virtual environment, you need to unset it in order to not interfere with the CMSSW environment.

Let’s first check which release version of Pythia8 we will be using.

cd ~/nobackup/cmsdas_2025_gen/CMSSW_12_4_8/src
cmsenv
scram tool info pythia8

You can find out that we are now using Pythia8.306 version that is already compiled in CMSSW_12_4_8.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Name : pythia8
Version : 306-84a4765a9948f9c1a5e66f80618e2c6d
++++++++++++++++++++

INCLUDE=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/include
LIB=pythia8
LIBDIR=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/lib
PYTHIA8DATA=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/share/Pythia8/xmldoc
PYTHIA8_BASE=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d
ROOT_INCLUDE_PATH=/cvmfs/cms.cern.ch/el8_amd64_gcc10/external/pythia8/306-84a4765a9948f9c1a5e66f80618e2c6d/include
SYSTEM_INCLUDE+=1
USE=root_cxxdefaults cxxcompiler hepmc3 hepmc lhapdf

Now we will start building our parton shower fragment in our own directories in order to produce samples by ourselves.

mkdir -p Configuration/GenProduction/python/

Create a new file Configuration/GenProduction/python/wplustest.py:

import FWCore.ParameterSet.Config as cms

import os

externalLHEProducer = cms.EDProducer('ExternalLHEProducer',
    args = cms.vstring(os.getenv("HOME")+ "/nobackup/cmsdas_2025_gen/genproductions_mg352/bin/MadGraph5_aMCatNLO/wplustest_4f_LO_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz"),
    nEvents = cms.untracked.uint32(5000),
    numberOfParameters = cms.uint32(1),
    outputFile = cms.string('cmsgrid_final.lhe'),
    scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh'),
    generateConcurrently = cms.untracked.bool(False),
)

from Configuration.Generator.Pythia8CommonSettings_cfi import *
from Configuration.Generator.MCTunesRun3ECM13p6TeV.PythiaCP5Settings_cfi import *

generator = cms.EDFilter("Pythia8HadronizerFilter",
    PythiaParameters = cms.PSet(
        pythia8CommonSettingsBlock,
        pythia8CP5SettingsBlock,
        parameterSets = cms.vstring(
            'pythia8CommonSettings',
            'pythia8CP5Settings',
        )
    ),
    comEnergy = cms.double(13600),
    maxEventsToPrint = cms.untracked.int32(1),
    pythiaHepMCVerbosity = cms.untracked.bool(False),
    pythiaPylistVerbosity = cms.untracked.int32(1),
)

Let’s compile.

scram b

cmsDriver.py executable makes the full configuration file based on the optional arguments it is given with (data tier, campaign, etc.) using the parton shower fragment that is built. We will create NanoGEN files that are flat ntuples that resembles the NanoAOD data tier but only stored with generator-level information related branches. It skips the SIM and RECO steps in the middle which makes it convenient to do generator-level studies. For more information, take a look at link.

cmsDriver.py Configuration/GenProduction/python/wplustest.py \
    --python_filename config.py \
    --eventcontent NANOAOD \
    --datatier NANOAOD \
    --fileout file:wplustest.root \
    --conditions auto:mc \
    --step LHE,GEN,NANOGEN \
    --no_exec \
    --mc \
    -n 100

You just created config.py that can be executed with cmsRun command. Take a look at config.py with less, how it evolved from Configuration/GenProduction/python/wplustest.py through cmsDriver.py. It will produce LHE files, run parton shower to make GEN samples, and then finally convert it to NanoGEN format in one go by doing below. Note that we will only test with 100 events (-n 100) due to time constraints.

cmsRun config.py

LHE files are first produced using the gridpack we’ve just produced.

   ______________________________________     
         Running Generic Tarball/Gridpack     
   ______________________________________     
gridpack tarball path = /uscms/home/enibigir/nobackup/cmsdas_2025_gen/genproductions_mg352/bin/MadGraph5_aMCatNLO/wplustest_4f_LO_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 thread count requested = 1
%MSG-MG5 residual/optional arguments =
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 number of cpus = 1
%MSG-MG5 SCRAM_ARCH version = el8_amd64_gcc10
%MSG-MG5 CMSSW version = CMSSW_12_4_8
Running MG5_aMC for the 1 time
produced_lhe  0 nevt  100 submitting_event  100  remaining_event  100
run.sh 100 2345670
Now generating 100 events with random seed 2345670 and granularity 1

Reweight with additional PDF sets given for possible systematic sources.

INFO: #***************************************************************************
#
# original cross-section: 30775.0
#     scale variation: +11.8% -12.7%
#     emission scale variation: + 0% - 0%
#     central scheme variation: +8.43e-09% -20.3%
# PDF variation: +0.918% -0.918%
#
#PDF NNPDF31_nnlo_as_0118_nf_4: 30776.1 +0.916% -0.916%
#PDF NNPDF30_nnlo_nf_4_pdfas: 29939.4 +1.81% -1.81%
#PDF NNPDF40_nnlo_nf_4_pdfas: 31022.5 +0.554% -0.554%
#PDF MSHT20nnlo_nf4: 30286.6 +1.2% -1.56%
#PDF PDF4LHC21_40_pdfas_nf4: 30529 +1.53% -1.53%
#PDF ABMP16_4_nnlo: 30385.1 +0.885% -0.885%
# dynamical scheme # 1 : 28597.7 +13.2% -14.3% # \sum ET
# dynamical scheme # 2 : 28599.7 +13.2% -14.3% # \sum\sqrt{m^2+pt^2}
# dynamical scheme # 3 : 24520.5 +16.6% -17.8% # 0.5 \sum\sqrt{m^2+pt^2}
# dynamical scheme # 4 : 30775 +11.8% -12.7% # \sqrt{\hat s}
# PDF 42930 : 30365.485849169454
#***************************************************************************

And then Pythia8 is launched with the LHE file created given as an input. It first prints out the LHE information as we saw directly in the LHE file.

  --------  PYTHIA Event Listing  (hard process)  -----------------------------------------------------------------------------------

    no         id  name            status     mothers   daughters     colours      p_x        p_y        p_z         e          m
     0         90  (system)           -11     0     0     0     0     0     0      0.000      0.000      0.000  13000.000  13000.000
     1       2212  (p+)               -12     0     0     3     0     0     0      0.000      0.000   6500.000   6500.000      0.938
     2       2212  (p+)               -12     0     0     4     0     0     0      0.000      0.000  -6500.000   6500.000      0.938
     3         -1  (dbar)             -21     1     0     5     0     0   501     -0.000      0.000      0.771      0.771      0.000
     4          2  (u)                -21     2     0     5     0   501     0      0.000     -0.000  -2136.814   2136.814      0.000
     5         24  (W+)               -22     3     4     6     7     0     0      0.000      0.000  -2136.043   2137.585     81.187
     6        -13  mu+                 23     5     0     0     0     0     0     11.363     36.276  -1442.911   1443.412      0.106
     7         14  nu_mu               23     5     0     0     0     0     0    -11.363    -36.276   -693.132    694.173      0.000
                                   Charge sum:  1.000           Momentum sum:      0.000      0.000  -2136.043   2137.585     81.187

 --------  End PYTHIA Event Listing  -----------------------------------------------------------------------------------------------

Starts the parton shower on top of the given LHE event. See how much more information gets printed out. Remember that parton shower goes lower and lower from the hard process until certain energy threshold (q -> q g -> q g g g -> q q q g g -> ...).

  --------  PYTHIA Event Listing  (complete event)  ---------------------------------------------------------------------------------

    no         id  name            status     mothers   daughters     colours      p_x        p_y        p_z         e          m
     0         90  (system)           -11     0     0     0     0     0     0      0.000      0.000      0.000  13000.000  13000.000
     1       2212  (p+)               -12     0     0    90     0     0     0      0.000      0.000   6500.000   6500.000      0.938
     2       2212  (p+)               -12     0     0    91     0     0     0      0.000      0.000  -6500.000   6500.000      0.938
     3         -1  (dbar)             -21     6     0     5     0     0   501     -0.000      0.000      0.771      0.771      0.000
     4          2  (u)                -21     7     7     5     0   501     0      0.000     -0.000  -2136.814   2136.814      0.000
     5         24  (W+)               -22     3     4     8     8     0     0      0.000      0.000  -2136.043   2137.585     81.187
     6         21  (g)                -41    10     0     9     3   502   501      0.000      0.000      1.719      1.719      0.000
     7          2  (u)                -42    11    11     4     4   501     0     -0.000     -0.000  -2136.814   2136.814      0.000
     8         24  (W+)               -44     5     5    12    12     0     0    -19.744    -26.752  -1604.300   1606.697     81.187
     9          1  (d)                -43     6     0    13    13   502     0     19.744     26.752   -530.795    531.836      0.330
    10         -4  (cbar)             -41    18     0    14     6     0   501      0.000      0.000      2.947      2.947      0.000
    11          2  (u)                -42    19    19     7     7   501     0      0.000      0.000  -2136.814   2136.814      0.000
    12         24  (W+)               -44     8     8    20    20     0     0     -1.064    -19.028  -1221.827   1224.670     81.187
    13          1  (d)                -44     9     9    17    17   502     0     27.853     30.104   -681.041    682.274      0.330
    14         -4  (cbar)             -43    10     0    15    16     0   502    -26.789    -11.076   -231.000    232.816      1.500
    15         -4  (cbar)             -51    14     0    22    22     0   503    -19.016     -8.750   -120.727    122.537      1.500
    16         21  (g)                -51    14     0    23    23   503   502     -5.668     -0.052   -161.724    161.823      0.000
    17          1  (d)                -52    13    13    21    21   502     0     25.748     27.830   -629.590    630.730      0.330
    18         -4  (cbar)             -41    25     0    24    10     0   504      0.000      0.000      3.067      3.067      0.000
    19          2  (u)                -42    26    26    11    11   501     0      0.000      0.000  -2136.814   2136.814      0.000
    20         24  (W+)               -44    12    12    27    27     0     0     -0.639    -17.937  -1205.912   1208.775     81.187
    21          1  (d)                -44    17    17    28    28   502     0     25.919     28.268   -639.604    640.753      0.330

After 1 event information is printed out, 100 events get processed and finally reports the cross section.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Overall cross-section summary
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process		xsec_before [pb]		passed	nposw	nnegw	tried	nposw	nnegw 	xsec_match [pb]			accepted [%]	 event_eff [%]
0		3.078e+04 +/- 2.327e+02		100	100	0	100	100	0	3.078e+04 +/- 2.327e+02		100.0 +/- 0.0	100.0 +/- 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total		3.078e+04 +/- 2.327e+02		100	100	0	100	100	0	3.078e+04 +/- 2.327e+02		100.0 +/- 0.0	100.0 +/- 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Before matching: total cross section = 3.078e+04 +- 2.327e+02 pb
After matching: total cross section = 3.078e+04 +- 2.327e+02 pb
Matching efficiency = 1.0 +/- 0.0   [TO BE USED IN MCM]
Filter efficiency (taking into account weights)= (100) / (100) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (100) / (100) = 1.000e+00 +- 0.000e+00    [TO BE USED IN MCM]

After filter: final cross section = 3.078e+04 +- 2.327e+02 pb
After filter: final fraction of events with negative weights = 0.000e+00 +- 0.000e+00
After filter: final equivalent lumi for 1M events (1/fb) = 3.249e-02 +- 2.478e-04

=============================================

How did the cross section change after parton shower?

MadGraph reported # original cross-section: 30775.0, 30775pb. After running parton shower with Pythia8, same cross section 30780pb is kept. Parton shower adds more and more vertices, but why does the cross section remain unchanged?

Solution

Parton shower is unitary. Sum of probability to branch (e.g q -> q g) and not branch is 1. Hence, the cross sections is determined by the lowest order input (hard process).

Bonus: How did the distribution change?

(2) Jet merging samples

Hard process calculation has advantage in modeling of hard jets and heavy particle decays while parton shower is great for describing collinear and soft emissions. For more realistic and reliable physics modeling of hard jets, for example in W+jet events, MadGraph can be used as below.

generate p p > w+, w+ > ell+ vl @0
add process p p > w+ j, w+ > ell+ vl @1

With such syntaxes, MadGraph produces W+jet process with 0 and 1 hard jet in the event. If this sample goes through parton shower, as some portion of events (dentoed with @1) readily involves hard jet, it would be better at describing W+jet process with hard jet. However consider the event @0 emitting QCD particles from initial state radiation that could possibly form a jet that is hard enough. Such phase space inherently possesses a problem of double counting as “W+jet with hard jet” event could come from both @0 and @1. To mitigate such issues and remove double counting of phase space contributions, jet merging technique is used. Jet merging is set up with an artificial cut threshold called jet merging scale. This scale decides whether an event will be accepted or not from both @0 and @1. Finally, only accepted events from the two processes will be merged and form one sample. Very roughly, jet merging scale can be thought as the momentum of a jet. If a jet in the event is hard enough above the threshold, events from @0 are rejected while only accepting from @1. On the other hand, if a jet in the event is not too hard below the threshold, events from @0 are only accepted while rejecting @1.

How to produce gridpack

How to set the Madgraph (run card) and Pythia (fragment)?

Hint

#*********************************************************************
# Matching - Warning! ickkw > 1 is still beta
#*********************************************************************
 0        = ickkw            ! 0 no matching, 1 MLM, 2 CKKW matching

This flag tells MadGraph that the LHE files we are going to produce will later be going through jet merging in > > order to avoid double countings.

#*********************************************************************
# Jet measure cuts                                                   *
#*********************************************************************
 0   = xqcut   ! minimum kt jet measure between partons

When jet merging is turned on, xqcut needs to be set which presample the events for efficient jet merging. Remember that some portion of events will be later discarded and never going to be used. So there is no point of producing events that involve jets with too low energy scale at this LHE level since these will likely be removed.

generator = cms.EDFilter("Pythia8HadronizerFilter",
    PythiaParameters = cms.PSet(
        pythia8CommonSettingsBlock,
        pythia8CP5SettingsBlock,
        processParameters = cms.vstring(
            'JetMatching:setMad = off',
            'JetMatching:scheme = 1',
            'JetMatching:merge = on',
            'JetMatching:jetAlgorithm = 2',
            'JetMatching:etaJetMax = 5.',
            'JetMatching:coneRadius = 1.',
            'JetMatching:slowJetPower = 1',
            'JetMatching:doShowerKt = off', 
            'JetMatching:qCut = 19.',
            'JetMatching:nQmatch = 4',
            'JetMatching:nJetMax = 1',
            'TimeShower:mMaxGamma = 4.0'
        ),
        parameterSets = cms.vstring(
            'pythia8CommonSettings',
            'pythia8CP5Settings',
            'processParameters',
        )
    ),

Bonus: What is the cross section?

Key Points

  • Pythia8 is the main tool used for parton showering in CMS

  • Events are not physical if it did not go through parton shower

  • Jet merging is a technique to avoid double countings of jet phase spaces in ME and PS calculations