CMSDAS Pre-Exercise 4: CMSSW basics
Overview
Teaching: 0 min
Exercises: 60 minQuestions
Objectives
Setup a CMSSW environment
Use git to download an example EDAnalyzer
Run a CMSSW job on real dimuon data to plot the Z peak
Questions
For this lesson, please submit your answers using CMSDAS@LPC2025 Google Form 4.
CMSSW
CMSSW is CMS’s software framework for analyzing the data produced by the CMS detector. The framework contains a large number of modules (C++), which perform tasks like:
- Loading RAW data (bits from the detector) into nice C++ objects;
- Reconstructing detector hits, physics objects (tracks, electrons, muons, hadrons, jets, …);
- Loading calibrations from databases and applying to physics objects;
- Interfacing with external generator programs like Pythia and Madgraph_aMC@NLO;
- Lots and lots of other things.
With the advent of NanoAOD, a simple ROOT format that does need CMSSW to be analyzed, CMS analysis is increasingly being performed completely outside of CMSSW. Your analysis group might have a framework that uses standalone ROOT, RDataFrame, or Scientific Python (e.g. numpy) instead. CMSSW is needed if your analysis needs additional variables not present in NanoAOD (for example, long-lived particle analysis often need RECO-level objects like tracker or calorimeter hits). You will also probably need to use CMSSW for detector, trigger, and/or POG work.
The framework goes hand-in-hand with the “Event Data Model” (EDM), which is how CMS represents events computationally. CMS saves events in several formats along the reconstruction chain, including RAW (data straight from detector), RECO (reconstruction performed), AOD (analysis object data=reduced RECO for analysis), MiniAOD (reduced AOD to fit in CMS’s disk space in Run 2), and NanoAOD (even further reduced MiniAOD). The upstream data formats are typically archived to tape storage, and must be loaded onto disk to be used. MiniAOD and NanoAOD are typically available on disk. We will learn more about finding data in the next exercise.
In this exercise, we will learn the basics of CMSSW, from setting up the software environment to running simple jobs.
Setting up CMSSW
Login to cmslpc as usual, and run the following commands to create a CMSSW environment. This will create a folder CMSSW_13_0_10_cmsdas
, which contains several subfolders. The most important folder is CMSSW_13_0_10_cmsdas/src
, which is where you put your code.
cd ~/nobackup/cmsdas
source /cvmfs/cms.cern.ch/cmsset_default.sh
scram project -n "CMSSW_13_0_10_cmsdas" CMSSW_13_0_10
For convenience, we suggest you edit your
~/.bash_profile
file to call thecmsset_default.sh
script automatically upon login. Add the whole line to this script.
Note that software environments are generally not cross-compatible! Namely, if you setup an LCG software release and CMSSW in the same session (or conda, etc.), things will break. For this exercise, make sure you have logged into a fresh session on cmslpc, so that the LCG environment from the previous exercise is not set up.
Note the release number,
13_0_10
: CMSSW is a massive project that is under continuous development, so we define “releases” that corresponds to a fixed snapshot at some point in time.CMSSW_13_0_*
is the release used for “NanoAODv12,” the https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/Releases/NanoAODv12 for Run 3 analysis. The first number in series (13
) indicates a major cycle, the second number (0
) a major release with new features with respect to the preceeding release, and the third number (10
) a release with minor updates and bug fixes to the preceeding release.
Next, execute the following commands to setup the environment in your current shell session:
cd CMSSW_13_0_10_cmsdas/src
cmsenv
This will provide you with a number of commands and environment variables. For example $CMSSW_BASE
is a handy variable that points to your CMSSW folder.
Question 4.1
The following command prints the location of your CMSSW area. Copy-and-paste the answer into the Google form 4.
echo $CMSSW_BASE
Question 4.2
CMSSW is connected to several external tools, for example the Pythia generator. The following command prints the version of Pythia connected to your current CMSSW release. Fill in the version number in the Google form 4.
scram tool info pythia8
Git
CMS makes extensive use of git
for code management, and you will use it throughout CMSDAS (CMSSW itself is managed as a git repository, but it’s a rather complicated example, so we won’t talk about CMSSW+git here). Here, we will simply use git to download some code. First, if you don’t already have a github account, go back to the [setup instructions][lesson-setup] and follow the directions, including setting up the SSH keys.
Choose your username wisely, it will appear on all your contributions to CMS code! In fact, even if you already have an account, if you have a username like edgelord1337
, consider either changing it or making a second account.
Once you have an account, run the following commands to configure git on cmslpc replacing everything including the [brackets]
, [Account]
is your github account username:
git config --global user.name "[Your name]"
git config --global user.email [Email]
git config --global user.github [Account]
Next, run the following commands to “clone” a repository. Make sure not to skip the cd
line, as the code has to end up in the correct folder structure.
cd $CMSSW_BASE/src
git clone git@github.com:FNALLPC/LearnCMSSW MyAnalysis/LearnCMSSW
If the
git clone
fails, it’s possible your SSH key was not setup correctly. Double check the [setup instructions][lesson-setup], and head to Mattermost for help.
This will copy all the code in the repository to $CMSSW_BASE/src/MyAnalysis/LearnCMSSW
. Feel free to glance through it.
Question 4.3: git repo info
When you cloned the repository, you not only downloaded the code, but also setup a local git repository connected to the remote repository. Use the following commands to print out the URL of the remote repository, from which you cloned the code:
cd $CMSSW_BASE/src/MyAnalysis/LearnCMSSW git remote -v
Copy and paste the first line into the Google form.
Running a CMSSW job
Now that we have the source code, we have to compile it. Execute the following to compile the package using scram, CMSSW’s build tool:
cd $CMSSW_BASE/src
scram b
scram b
accept an argument -j
to use more cores for the compilation. Don’t go above -j4
, as overloading the cores will negatively impact other users on your cmslpc interactive node.
Finally, let’s actually run some code. CMSSW jobs are configured through python files. We will use $CMSSW_BASE/src/MyAnalysis/test/zpeak_cfg.py
, which is a simple configuration file that loads the plugin at $CMSSW_BASE/src/MyAnalysis/LearnCMSSW/plugins/ZPeakAnalyzer.cc
. The ZPeakAnalyzer
processes some dimuon events in MiniAOD format and produces some histograms (a bit of an uncommon workflow, as it is typically more efficient to make histograms from NanoAOD or another slimmed-down format). Launch CMSSW with the following:
cd $CMSSW_BASE/src/MyAnalysis/LearnCMSSW/test
cmsRun zpeak_cfg.py
The job will take a minute to run, periodically updating you on the progress. When it’s done, you should see a file ZPeak.root
. Let’s open it and plot the Z peak:
root -l ZPeak.root
[0] TH1D* dimuonMass = (TH1D*)_file0->Get("zpeak_analyzer/dimuonMass")
[1] dimuonMass->Draw()
Question 4.4
Using the stat box drawn along with the histogram, what is the mean dimuon mass? Write your answer in the Google form.
Key Points
CMSSW is CMS’s software framework for data processing.
The framework consists of lots of C++ modules, which are configured using python.
CMSSW jobs are launched using commands like
cmsRun myCfg.py
We provide an example EDAnalyzer and cfg.py file for plotting a Z peak directly from a MiniAOD file.
Analyzing simple ROOT ntuples like NanoAOD does not need CMSSW!