Exercise 1
Overview
Teaching: min
Exercises: minQuestions
Objectives
Selecting a final state and trigger (Wednesday Morning)
Choose one of the three $\tau \tau$ final states with at least one hadronically decaying $\tau$. Of the final states we are considering, there are two options for choice of trigger. (Slide 5 of the intro slides). We need to decide which is best for our analysis.
- For the $\mu+\tau_h$ channel consider :
- The single isolated muon.
- The trigger requiring a single muon and single $\tau_h$.
- For the $e+\tau_h$ channel consider :
- The single electron trigger.
- The trigger requiring a single electron and single $\tau_h$.
- For the $\tau_h+\tau_h$ channel consider :
- The trigger requiring two $\tau_h$.
- The trigger requiring a single $\tau_h$ (of large $p_T$)
The available list of all triggers in NANOAOD can be found in the NANOAOD documentation.
To learn to write Python code and use the NanoAODTools framework, we will start with exampleAnalysis.py
as a template. To run the example:
cd ${CMSSW_BASE}/src/PhysicsTools/NanoAODTools/analysis/
python ${CMSSW_BASE}/src/PhysicsTools/NanoAODTools/python/postprocessing/examples/exampleAnalysis.py
This example selects events from the input file and applies a preselection choosing events with at least one jet of pt > 250 GeV. It then loops over these events, selects those with at least 2 muons, and creates the Lorentzvector sum of the electrons, muons, and jets in the event. A histogram is then filled with the $p_T$ of the vector. An additional example of the syntax (c++) for preselection can be found here. Although for our purposes feel free to not use any at all.
You may view the histogram using the root TBrowser:
root histOut.root
TBrowser b
The input file is The input file is:
root://cmseos.fnal.gov//store/user/cmsdas/2023/short_exercises/Tau/DYJetsToLL__7B7D90CB-14EF-B749-B4D7-7C413FE3CCC1.root
We need to calculate the signal efficiency for each of the two trigger options. To do so, you need to loop over all the entries in a signal MC file using the template provided.
-
As we are only interested in the events for your particular final state, You need to select those events in which the Z boson decays into a pair of tau leptons, which each then subsequently decay into your particular final state. For example in the $\mu+\tau_h$ channel, you will want your “denominator” events to be those in which there is exactly 1 reconstructed muon (gen-matched to a muon from a tau decay) and 1 reconstructed tauh (gen-matched to a hadronic tau decay); apply no further selection to the reconstructed particles. The Tau_genPartFlav and Muon_genPartFlav variables in the NANOAOD documentation have information about the gen matching which you can use in your loops.
-
Once the denominator events are selected, you then determine the numerator events: the events must pass your trigger, the particles must satisfy baseline ID requirements (e.g. anti-e, anti-mu and anti-jet tagging for the $\tau_h$, the particles must pass the additional kinematic (pt and eta) requirements dictated within the trigger name. The triggers are booleans: pass or not pass.
Although in c++, examples of event loops used in the Tau Short Exercise can be found in eff.c and taumdm.c.
For analysis, the reconstructed objects (muons, electrons, taus) need additional identification quality criteria:
- Muons:
- A cut requiring $ |\eta| < 2.4 $ . This is maximum extent in the eta coordinates of the muon detectors.
- Muon_tightId, or Muon_mediumId. This ensures you have a quality muon. Choose one.
- The muon isolation variable is named Muon_pfIsoId. This needs to be applied if considering the isolated muon triggers.
- https://twiki.cern.ch/CMS/SWGuideMuonIdRun2
- Electrons:
- A cut requiring $|\eta|<2.5$. This is the maximum extent in the eta coordinates of the silicon tracker.
Electron_mvaFall17V2Iso_WP80
,Electron_mvaFall17V2Iso_WP90
. This ensures a quality electron. Choose one.- https://twiki.cern.ch/CMS/EgammaIDRecipesRun2
- Taus:
- A cut requiring $|\eta|<2.3$. This is to ensure the tau is well within the acceptance of the silicon tracker.
- A cut requiring $p_T > 20 GeV$.
- A cut vetoing
Tau_decayMode =5, 6, 7
. These are “experimental 2-prong” decay modes. - The
VLoose
,Tight
, Tight working points for the DeepTau discriminators:Tau_idDeepTau2017v2p1VSe
,Tau_idDeepTau2017v2p1VSmu
,Tau_idDeepTau2017v2p1VSjet
- https://twiki.cern.ch/CMS/TauIDRecommendationForRun2
- Jets:
- A cut requiring $|\eta|<2.5$. We only consider jets within the tracker as we will be b-tagging these jets.
- A cut requiring $p_T>20$.
- An ID variable:
4&Jet_jetId
- https://twiki.cern.ch/CMS/JetID13TeVRun2018
The different triggers may require different kinematic cuts ($p_T, \eta$) depending on their design. Some triggers have different isolation requirements on the reconstructed objects. The trigger variable names (fortunately) describe the criteria on the reconstructed objects to be applied offline. The goal of this section is too see which of the two routes specific to your channel (trigger 1 or trigger 2) give the greatest signal efficiency.
To get an estimate of the number of expected events to be observed in data, you need to scale your mc events by a “cross section weight”. This concept is highlighted here .
Deliverable
Deliverable for the end of the day: A table containing the signal efficiencies for each of your two trigger options, including the total number of events which passed your baseline and numerator selection. Scale these numbers by the appropriate cross section weight to produce a ballpark estimate for the number of events expected at 2018 luminosity.
Key Points
Exercise 2
Overview
Teaching: min
Exercises: minQuestions
Objectives
Event Analysis code
We now need to write the main portion of the code which will reconstruct our Z candidate, as well as other variables which we will use in the background estimation. At the very least, you will need to include
- The visible mass of your reconstructed final state particle pair.
- If the event does not have such a pair, you may discard it (return False) as you will never be using it for your analysis.
- The number of b-tagged jets.
- b-tagging is achieved with the Jet_btagDeepB variable, where you require the value to be greater than 0.1241, 0.4184, 0.7527 depending on your intended working point.
- The transverse mass of either $e+$MET or $\mu+$ MET (for $e+\tau_h$ or $\mu+\tau_h$ only).
Start with MuTauProducer.py
, ETauProducer.py
, and TauTauProducer.py
(under ${CMSSW_BASE}/src/PhysicsTools/NanoAODTools/python/postprocessing/examples/)
as a boiler plate and create a new analyze function to calculate the relevant variables for your analysis on an event-by-event basis.
To run the example:
cd ${CMSSW_BASE}/src/PhysicsTools/NanoAODTools/analysis/
python ${CMSSW_BASE}/src/PhysicsTools/NanoAODTools/python/postprocessing/examples/example_postproc.py
It is similiar to the exampleAnalysis.py
you modified yesterday, but instead of creating a histogram this creates a new branch EventMass which is added to the Events tree in addition to those already in the NANOAOD file. The input file is the DYJetsToLL_M-50 signal file used yesterday. As we are now working to understand the backgrounds in addition to our Z signal, you may interested in any of the following three MC samples, accessed in the following manner:
root://cmseos.fnal.gov//store/user/cmsdas/2023/short_exercises/Tau/DYJetsToLL__7B7D90CB-14EF-B749-B4D7-7C413FE3CCC1.root
root://cmseos.fnal.gov//store/user/cmsdas/2023/short_exercises/Tau/WJetsToLNu__AE18A33F-9CF5-BC4E-A1E9-46F7BF382AF1.root
root://cmseos.fnal.gov//store/user/cmsdas/2023/short_exercises/Tau/TTTo2L2Nu__1656732C-0CD4-F54B-B39D-19CA08E18A77.root
Deliverable
For each of the MC samples, make a histogram of the three variables mentioned in the bullets above. An example code to make the histograms from a given file/TTree and plot them can be found here. Before we submit jobs to condor to run over the whole datasets (which takes overnight) please understand these distributions:
- Are you able to see a peak in the visible mass distribution in the $Z$ mc? (You can compare with Slide 10 of the introductory slides.)
- How many b-tagged jets do you expect in in the TTJets MC sample? DYJetsToLL_M-50, WJetsToLNu?
- Does your mt distribution in the WJetsToLNu MC appear as you would expect; given the W boson mass is 80.379. (You can compare with that seen on Slide 8 of the introductory slides.)
As you make these, please paste these in the Mattermost chat for discussion and comparisons.
Once you have made (and tested/verified) your main analysis loops it is now time to run the code on the full datasets including MC. We will use condor to process the data. This is done with a script called submitToGrid.py
. You will need to modify your username on L1. You can see the “main” function which is called is example_postproc.py
which we modified before.
You may see the list of our datasets and mc samples of interest which will be submitted here. One job is created for each of those files. These are skims which have made of the entire datasets prior to CMSDAS. To (drastically) increase processing time and decrease the output file size, make sure to enable friend mode. This creates a friend tree. This ensures that only new variables which you created are written into your output tree, and not the entire Events tree.. Do not apply any additional preselection at this time either. Friends are important. The smaller input files may finish within 30 minutes, check one of these to make sure everything is as expected (ZZ has 0 entries so do not be worried about this file).
cd ${CMSSW_BASE}/src/PhysicsTools/NanoAODTools/condor/
python submitToGrid.py
It would be good if you can submit the condor job before you go to sleep so that you can get the result the next morning.
Key Points
Exercise 3
Overview
Teaching: min
Exercises: minQuestions
Objectives
Estimate backgrounds (Thursday)
The goal of today is to predict the number of signal and background events expected in the signal region. The signal sample we are using is called DYJetsToTauTau_M50. We use the WJetsToLNu, TTToSemiLeptonic, TTTo2L2Nu, and DYJetsToEEMuMu_M-50 MC samples to predict the expected yields for the respective backgrounds. We use the ABCD method for QCD. Using the events observed in the B, C, and D control regions we can form our transfer factor (C/D) and multiply that by the number of events observed in the control region B to form the prediction. The assumption is that the B,C,D regions are predominantly populated by QCD events. In principle, the background processes we took from simulation (in the signal region) may also populate our B,C,D regions. We correct for this by using the simulation to subtract off their contribution from the observed data yields. A slide showing the algebra can be found here.
To save time, a script has been prepared to perform this arithmetic. For each dataset it makes histograms of the visible mass for each of the four ABCD regions. Event selection (such as cutting on mT or the number of b-tagged jets) is performed at this stage.
https://github.com/jingyucms/nanoAOD-tools/blob/cmsdas2023/analysis/yields_ZTauTau.c
If you are analyzing the $\mu+\tau_h$ channel, you only need to change the input file location to point to the files you produced from the last exercise and run. For the other 2 channels, you will need to make some small modifications. Then, you should use these histograms to calculate the scale factor and apply the scale factor to estimate the QCD background in region A.
Deliverable
The overall scale factor derived from the ratio of estimated QCD in C and D A list of the expected signal and background events in the signal region Histograms (for visible mass) of data on top of estimated background components (QCD from data and other from MC)
Key Points
Exercise 4
Overview
Teaching: min
Exercises: minQuestions
Objectives
Statistical Analysis
We will use the Higgs Combine Tool to do the statistical analysis. We first need to install Combine:
cd $CMSSW_BASE/src
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit
git fetch origin
git checkout v8.1.0
cd $CMSSW_BASE/src
scramv1 b clean; scramv1 b
As an input, Higgs Combine takes a txt based file containing the observed and expected yields. An example datacard, based on Tables 1 and 3 of the 13 TeV Z paper, can be found at the link below. Use this template to create a card for your analysis.
cd ${CMSSW_BASE}/src/PhysicsTools/NanoAODTools/analysis/
combine -M FitDiagnostics datacard.txt --forceRecreateNLL --rMin=0.1 --rMax=10.
combine -M Significance datacard.txt
Deliverable
The best fitted r value. Does the best fitted r close to 1? Is the result what you expected? Can you compare your results to the best theoretical predictions (that can be found in the 13 TeV paper)?
Bonus
- Can you present the stats and systs uncertainties for your final cross section?
Key Points