This lesson is being piloted (Beta version)

Exercise 6

Overview

Teaching: min
Exercises: min
Questions
Objectives

Histogram template analysis using CombinedLimit tool (optional)

In this part of the exercise, we will use the histogram functionality of the ‘combine’ tool in order to set a limit using one-dimensional non-parametric shapes (i.e. histograms) for a multi-channel analysis (inspired by EXO-11-099).

Part 6.1 Setting up the input root file

You can get the properly formatted input root file located at 6/shapes_file.root.

Take a moment to look inside the file. It is especially interesting to look at our observable for signal (“tprime600”) and compare with the two backgrounds (“top” and “ewk”). This shape analysis relies on the fact that the signal distribution peaks near zero while the backgrounds peak at high values.

The combine tool looks for the input histograms in a root file formatted according to the pattern you provide in your data card. In this case, we will format the data using the following convention:

$CHANNEL/$PROCESS_$SYSTEMATIC

Our analysis channels in this example are electron+jets (“ejets”) and muon+jets (“mujets”). We have a signal process (a t’ quark with mass 600 GeV, “tprime600”) and two background processes (SM top quark production, “top”, and the electroweak backgrounds, “ewk”).

We also have two systematic uncertainties which are parameterized by histograms representing a one-sigma shift up and down from the mean, one related to uncertainty in the jet energy scale (“jes”) and one for the uncertainty in the b-tagging efficiency scale factor between MC and data (“btgsf”).

We will also consider some scaling uncertainties: luminosity (we will assume that the background normalizations are data-driven, so this only affects the signal), background normalization uncertainty for each background (which are assumed to come from our data-driven normalization procedure and can be different for the different backgrounds) and individual uncertainties for electron and muon selection (affecting only the relevant sub-channel). However, since these uncertainties only scale the individual components, we do not need to produce systematic shift histograms for them.

Part 6.2 Setting up the data card

Now let’s create the data card for combine, following the naming scheme used in the file. We will have two analysis channels (“ejets” and “mujets”), two background processes (“top” and “ewk”), and a total of seven systematic uncertainties (enumerated in the previous section). Write down the first block of the data card (you could call it ‘myshapes.txt’) with the appropriate imax, jmax, and kmax.

Hint

imax 2
jmax 2
kmax 7
---------------

Next, we need a line which tells combine that we are running in shapes mode, and tells it where to find the shapes in our input root file. We do this with the following line:

shapes * * shapes_file.root $CHANNEL/$PROCESS  $CHANNEL/$PROCESS_$SYSTEMATIC
---------------

Now, we should give our channels the correct names, and specify the number of observed events in each channel (this must match the number of observed events in each channel).

Hint

bin ejets mujets
observation 4734 6448
---------------

Next, we need to define the signal and background processes. Let’s use process number 1 for top and 2 for ewk. Again, the rate for each process should match the integral of the corresponding histogram. NB that you will need a column for each background in both the ejets and mujets channel!

Hint

bin             ejets      ejets   ejets   mujets     mujets    mujets
process         tprime600  top     ewk     tprime600  top       ewk
process         0          1       2       0          1         2
rate            227        4048    760     302        5465.496783      1098.490939
--------------------------------

Nearing the end, we need to define the systematic uncertainties. Define the following log-normal scaling uncertainties:

Hint

bin             ejets      ejets   ejets   mujets     mujets    mujets
process         tprime600  top     ewk     tprime600  top       ewk
process         0          1       2       0          1         2
rate            227        4048    760     302        5465.496783      1098.490939
--------------------------------
lumi     lnN    1.10        -       -     1.10        -         -
bgnortop lnN    -          1.114    -     -          1.114      -
bgnorewk lnN    -           -      1.5    -           -        1.5
eff_mu   lnN    -           -       -     1.03       1.03      1.03
eff_e    lnN    1.03       1.03    1.03    -          -         -

Finally, we need to tell combine about the “jes” and “btgsf” shape uncertainty histograms defined in our input .root file. To do this we use the “shape” type of uncertainty, and give a scaling term for each channel. Here, we’ll use “1” for all channels (meaning that the input histogram is meant to represent a 1-sigma shift).

Hint

The complete card file:

imax 2
jmax 2
kmax 7
---------------
shapes * * shapes_file.root $CHANNEL/$PROCESS  $CHANNEL/$PROCESS_$SYSTEMATIC
---------------
bin ejets mujets
observation 4734 6448
------------------------------
bin             ejets      ejets   ejets   mujets     mujets    mujets
process         tprime600  top     ewk     tprime600  top       ewk
process         0          1       2       0          1         2
rate            227        4048    760     302        5465.496783      1098.490939
--------------------------------
lumi     lnN    1.10        -       -     1.10        -         -
bgnortop lnN    -          1.114    -     -          1.114      -
bgnorewk lnN    -           -      1.5    -           -        1.5
eff_mu   lnN    -           -       -     1.03       1.03      1.03
eff_e    lnN    1.03       1.03    1.03    -          -         -
jes    shape    1           1       1      1          1         1   uncertainty on shape due to JES uncertainty
btgsf  shape    1           1       1      1          1         1   uncertainty on shape due to b-tagging scale factor uncertainty

Now, we can run the tool, using the verbose mode in order to track our histograms being fetched from the input root file:

%%bash
combine -v 3 -M AsymptoticLimits  myshapes.txt

Take a moment to parse the output of combine. Is the limit you set reasonable given the shapes of the input distributions compared to the data? Given the data, do you expect the limit to be close to the expected limit in the no-signal case?

Key Points