1. Loading Datasets#

Authors: Javier Duarte, Raghav Kansal

1.1. Load datasets from ROOT files using uproot#

Here we load the ROOT datasets in python using uproot (see: scikit-hep/uproot). For more information about how to use uproot, see the Uproot and Awkward Array for columnar analysis HATS@LPC 2023 tutorial.

import uproot

Download datasets from Zenodo:

%%bash
mkdir -p data
wget -O data/ntuple_4mu_bkg.root "https://zenodo.org/record/3901869/files/ntuple_4mu_bkg.root?download=1"
wget -O data/ntuple_4mu_VV.root "https://zenodo.org/record/3901869/files/ntuple_4mu_VV.root?download=1"
Hide code cell output
--2023-08-12 00:12:36--  https://zenodo.org/record/3901869/files/ntuple_4mu_bkg.root?download=1
Resolving zenodo.org (zenodo.org)... 188.185.124.72
Connecting to zenodo.org (zenodo.org)|188.185.124.72|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8867265 (8.5M) [application/octet-stream]
Saving to: ‘data/ntuple_4mu_bkg.root’

     0K .......... .......... .......... .......... ..........  0%  467K 18s
    50K .......... .......... .......... .......... ..........  1%  471K 18s
   100K .......... .......... .......... .......... ..........  1%  105M 12s
   150K .......... .......... .......... .......... ..........  2%  469K 14s
   200K .......... .......... .......... .......... ..........  2% 80.3M 11s
   250K .......... .......... .......... .......... ..........  3%  163M 9s
   300K .......... .......... .......... .......... ..........  4%  139M 8s
   350K .......... .......... .......... .......... ..........  4%  132M 7s
   400K .......... .......... .......... .......... ..........  5%  473K 8s
   450K .......... .......... .......... .......... ..........  5%  108M 7s
   500K .......... .......... .......... .......... ..........  6%  168M 6s
   550K .......... .......... .......... .......... ..........  6%  138M 6s
   600K .......... .......... .......... .......... ..........  7%  155M 5s
   650K .......... .......... .......... .......... ..........  8%  479K 6s
   700K .......... .......... .......... .......... ..........  8% 45.9M 6s
   750K .......... .......... .......... .......... ..........  9%  101M 5s
   800K .......... .......... .......... .......... ..........  9%  146M 5s
   850K .......... .......... .......... .......... .......... 10%  154M 5s
   900K .......... .......... .......... .......... .......... 10%  144M 4s
   950K .......... .......... .......... .......... .......... 11%  479K 5s
  1000K .......... .......... .......... .......... .......... 12%  109M 5s
  1050K .......... .......... .......... .......... .......... 12% 68.5M 4s
  1100K .......... .......... .......... .......... .......... 13% 61.4M 4s
  1150K .......... .......... .......... .......... .......... 13%  117M 4s
  1200K .......... .......... .......... .......... .......... 14%  153M 4s
  1250K .......... .......... .......... .......... .......... 15%  479K 4s
  1300K .......... .......... .......... .......... .......... 15%  161M 4s
  1350K .......... .......... .......... .......... .......... 16% 58.7M 4s
  1400K .......... .......... .......... .......... .......... 16%  149M 4s
  1450K .......... .......... .......... .......... .......... 17%  475K 4s
  1500K .......... .......... .......... .......... .......... 17%  138M 4s
  1550K .......... .......... .......... .......... .......... 18%  113M 4s
  1600K .......... .......... .......... .......... .......... 19%  117M 4s
  1650K .......... .......... .......... .......... .......... 19%  150M 4s
  1700K .......... .......... .......... .......... .......... 20%  143M 3s
  1750K .......... .......... .......... .......... .......... 20%  474K 4s
  1800K .......... .......... .......... .......... .......... 21%  132M 4s
  1850K .......... .......... .......... .......... .......... 21%  122M 3s
  1900K .......... .......... .......... .......... .......... 22%  156M 3s
  1950K .......... .......... .......... .......... .......... 23%  132M 3s
  2000K .......... .......... .......... .......... .......... 23%  149M 3s
  2050K .......... .......... .......... .......... .......... 24%  475K 3s
  2100K .......... .......... .......... .......... .......... 24%  151M 3s
  2150K .......... .......... .......... .......... .......... 25%  131M 3s
  2200K .......... .......... .......... .......... .......... 25%  162M 3s
  2250K .......... .......... .......... .......... .......... 26%  472K 3s
  2300K .......... .......... .......... .......... .......... 27%  156M 3s
  2350K .......... .......... .......... .......... .......... 27%  127M 3s
  2400K .......... .......... .......... .......... .......... 28%  175M 3s
  2450K .......... .......... .......... .......... .......... 28%  167M 3s
  2500K .......... .......... .......... .......... .......... 29%  478K 3s
  2550K .......... .......... .......... .......... .......... 30% 49.4M 3s
  2600K .......... .......... .......... .......... .......... 30%  176M 3s
  2650K .......... .......... .......... .......... .......... 31%  169M 3s
  2700K .......... .......... .......... .......... .......... 31%  171M 3s
  2750K .......... .......... .......... .......... .......... 32%  141M 3s
  2800K .......... .......... .......... .......... .......... 32%  477K 3s
  2850K .......... .......... .......... .......... .......... 33% 54.7M 3s
  2900K .......... .......... .......... .......... .......... 34%  155M 3s
  2950K .......... .......... .......... .......... .......... 34%  149M 3s
  3000K .......... .......... .......... .......... .......... 35%  157M 3s
  3050K .......... .......... .......... .......... .......... 35%  139M 2s
  3100K .......... .......... .......... .......... .......... 36%  479K 3s
  3150K .......... .......... .......... .......... .......... 36% 48.0M 3s
  3200K .......... .......... .......... .......... .......... 37%  138M 2s
  3250K .......... .......... .......... .......... .......... 38%  159M 2s
  3300K .......... .......... .......... .......... .......... 38%  155M 2s
  3350K .......... .......... .......... .......... .......... 39%  129M 2s
  3400K .......... .......... .......... .......... .......... 39%  479K 2s
  3450K .......... .......... .......... .......... .......... 40%  110M 2s
  3500K .......... .......... .......... .......... .......... 40% 58.4M 2s
  3550K .......... .......... .......... .......... .......... 41%  129M 2s
  3600K .......... .......... .......... .......... .......... 42%  152M 2s
  3650K .......... .......... .......... .......... .......... 42%  478K 2s
  3700K .......... .......... .......... .......... .......... 43%  117M 2s
  3750K .......... .......... .......... .......... .......... 43% 65.4M 2s
  3800K .......... .......... .......... .......... .......... 44% 74.7M 2s
  3850K .......... .......... .......... .......... .......... 45%  153M 2s
  3900K .......... .......... .......... .......... .......... 45%  160M 2s
  3950K .......... .......... .......... .......... .......... 46%  478K 2s
  4000K .......... .......... .......... .......... .......... 46%  151M 2s
  4050K .......... .......... .......... .......... .......... 47% 54.3M 2s
  4100K .......... .......... .......... .......... .......... 47%  144M 2s
  4150K .......... .......... .......... .......... .......... 48%  137M 2s
  4200K .......... .......... .......... .......... .......... 49%  160M 2s
  4250K .......... .......... .......... .......... .......... 49%  478K 2s
  4300K .......... .......... .......... .......... .......... 50%  150M 2s
  4350K .......... .......... .......... .......... .......... 50% 26.8M 2s
  4400K .......... .......... .......... .......... .......... 51%  128M 2s
  4450K .......... .......... .......... .......... .......... 51%  150M 2s
  4500K .......... .......... .......... .......... .......... 52%  157M 2s
  4550K .......... .......... .......... .......... .......... 53%  482K 2s
  4600K .......... .......... .......... .......... .......... 53%  156M 2s
  4650K .......... .......... .......... .......... .......... 54% 25.1M 2s
  4700K .......... .......... .......... .......... .......... 54%  168M 2s
  4750K .......... .......... .......... .......... .......... 55%  137M 2s
  4800K .......... .......... .......... .......... .......... 56%  483K 2s
  4850K .......... .......... .......... .......... .......... 56% 62.5M 2s
  4900K .......... .......... .......... .......... .......... 57%  166M 2s
  4950K .......... .......... .......... .......... .......... 57% 26.1M 2s
  5000K .......... .......... .......... .......... .......... 58%  182M 2s
  5050K .......... .......... .......... .......... .......... 58%  202M 1s
  5100K .......... .......... .......... .......... .......... 59%  481K 2s
  5150K .......... .......... .......... .......... .......... 60%  102M 1s
  5200K .......... .......... .......... .......... .......... 60%  160M 1s
  5250K .......... .......... .......... .......... .......... 61% 28.6M 1s
  5300K .......... .......... .......... .......... .......... 61%  158M 1s
  5350K .......... .......... .......... .......... .......... 62%  146M 1s
  5400K .......... .......... .......... .......... .......... 62%  481K 1s
  5450K .......... .......... .......... .......... .......... 63%  157M 1s
  5500K .......... .......... .......... .......... .......... 64%  171M 1s
  5550K .......... .......... .......... .......... .......... 64% 27.9M 1s
  5600K .......... .......... .......... .......... .......... 65%  179M 1s
  5650K .......... .......... .......... .......... .......... 65%  182M 1s
  5700K .......... .......... .......... .......... .......... 66%  481K 1s
  5750K .......... .......... .......... .......... .......... 66%  135M 1s
  5800K .......... .......... .......... .......... .......... 67% 36.6M 1s
  5850K .......... .......... .......... .......... .......... 68% 68.9M 1s
  5900K .......... .......... .......... .......... .......... 68%  170M 1s
  5950K .......... .......... .......... .......... .......... 69%  480K 1s
  6000K .......... .......... .......... .......... .......... 69%  140M 1s
  6050K .......... .......... .......... .......... .......... 70%  157M 1s
  6100K .......... .......... .......... .......... .......... 71% 27.4M 1s
  6150K .......... .......... .......... .......... .......... 71%  148M 1s
  6200K .......... .......... .......... .......... .......... 72%  176M 1s
  6250K .......... .......... .......... .......... .......... 72%  482K 1s
  6300K .......... .......... .......... .......... .......... 73%  149M 1s
  6350K .......... .......... .......... .......... .......... 73%  124M 1s
  6400K .......... .......... .......... .......... .......... 74% 25.9M 1s
  6450K .......... .......... .......... .......... .......... 75%  158M 1s
  6500K .......... .......... .......... .......... .......... 75%  161M 1s
  6550K .......... .......... .......... .......... .......... 76%  482K 1s
  6600K .......... .......... .......... .......... .......... 76%  154M 1s
  6650K .......... .......... .......... .......... .......... 77%  148M 1s
  6700K .......... .......... .......... .......... .......... 77% 24.5M 1s
  6750K .......... .......... .......... .......... .......... 78%  129M 1s
  6800K .......... .......... .......... .......... .......... 79%  160M 1s
  6850K .......... .......... .......... .......... .......... 79%  483K 1s
  6900K .......... .......... .......... .......... .......... 80%  145M 1s
  6950K .......... .......... .......... .......... .......... 80% 27.1M 1s
  7000K .......... .......... .......... .......... .......... 81%  120M 1s
  7050K .......... .......... .......... .......... .......... 81%  140M 1s
  7100K .......... .......... .......... .......... .......... 82%  149M 1s
  7150K .......... .......... .......... .......... .......... 83%  481K 1s
  7200K .......... .......... .......... .......... .......... 83%  150M 1s
  7250K .......... .......... .......... .......... .......... 84% 28.7M 1s
  7300K .......... .......... .......... .......... .......... 84% 63.0M 1s
  7350K .......... .......... .......... .......... .......... 85%  133M 1s
  7400K .......... .......... .......... .......... .......... 86%  482K 1s
  7450K .......... .......... .......... .......... .......... 86%  142M 0s
  7500K .......... .......... .......... .......... .......... 87%  142M 0s
  7550K .......... .......... .......... .......... .......... 87% 26.5M 0s
  7600K .......... .......... .......... .......... .......... 88%  135M 0s
  7650K .......... .......... .......... .......... .......... 88%  480K 0s
  7700K .......... .......... .......... .......... .......... 89%  149M 0s
  7750K .......... .......... .......... .......... .......... 90%  133M 0s
  7800K .......... .......... .......... .......... .......... 90%  156M 0s
  7850K .......... .......... .......... .......... .......... 91% 46.9M 0s
  7900K .......... .......... .......... .......... .......... 91%  159M 0s
  7950K .......... .......... .......... .......... .......... 92%  479K 0s
  8000K .......... .......... .......... .......... .......... 92%  149M 0s
  8050K .......... .......... .......... .......... .......... 93%  159M 0s
  8100K .......... .......... .......... .......... .......... 94% 30.4M 0s
  8150K .......... .......... .......... .......... .......... 94%  139M 0s
  8200K .......... .......... .......... .......... .......... 95%  146M 0s
  8250K .......... .......... .......... .......... .......... 95%  478K 0s
  8300K .......... .......... .......... .......... .......... 96%  161M 0s
  8350K .......... .......... .......... .......... .......... 97% 55.7M 0s
  8400K .......... .......... .......... .......... .......... 97%  153M 0s
  8450K .......... .......... .......... .......... .......... 98%  170M 0s
  8500K .......... .......... .......... .......... .......... 98%  477K 0s
  8550K .......... .......... .......... .......... .......... 99%  124M 0s
  8600K .......... .......... .......... .......... .......... 99%  172M 0s
  8650K .........                                             100%  173M=3.5s

2023-08-12 00:12:41 (2.40 MB/s) - ‘data/ntuple_4mu_bkg.root’ saved [8867265/8867265]

--2023-08-12 00:12:41--  https://zenodo.org/record/3901869/files/ntuple_4mu_VV.root?download=1
Resolving zenodo.org (zenodo.org)... 188.185.124.72
Connecting to zenodo.org (zenodo.org)|188.185.124.72|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4505518 (4.3M) [application/octet-stream]
Saving to: ‘data/ntuple_4mu_VV.root’

     0K .......... .......... .......... .......... ..........  1%  448K 10s
    50K .......... .......... .......... .......... ..........  2%  452K 10s
   100K .......... .......... .......... .......... ..........  3%  138M 6s
   150K .......... .......... .......... .......... ..........  4%  454K 7s
   200K .......... .......... .......... .......... ..........  5% 47.3M 6s
   250K .......... .......... .......... .......... ..........  6%  152M 5s
   300K .......... .......... .......... .......... ..........  7%  156M 4s
   350K .......... .......... .......... .......... ..........  9%  125M 3s
   400K .......... .......... .......... .......... .......... 10%  460K 4s
   450K .......... .......... .......... .......... .......... 11% 42.9M 3s
   500K .......... .......... .......... .......... .......... 12%  141M 3s
   550K .......... .......... .......... .......... .......... 13%  143M 3s
   600K .......... .......... .......... .......... .......... 14%  157M 3s
   650K .......... .......... .......... .......... .......... 15%  455K 3s
   700K .......... .......... .......... .......... .......... 17%  139M 3s
   750K .......... .......... .......... .......... .......... 18%  137M 3s
   800K .......... .......... .......... .......... .......... 19%  171M 2s
   850K .......... .......... .......... .......... .......... 20%  150M 2s
   900K .......... .......... .......... .......... .......... 21%  163M 2s
   950K .......... .......... .......... .......... .......... 22%  454K 2s
  1000K .......... .......... .......... .......... .......... 23%  141M 2s
  1050K .......... .......... .......... .......... .......... 25%  130M 2s
  1100K .......... .......... .......... .......... .......... 26%  148M 2s
  1150K .......... .......... .......... .......... .......... 27%  128M 2s
  1200K .......... .......... .......... .......... .......... 28%  148M 2s
  1250K .......... .......... .......... .......... .......... 29%  455K 2s
  1300K .......... .......... .......... .......... .......... 30%  146M 2s
  1350K .......... .......... .......... .......... .......... 31%  142M 2s
  1400K .......... .......... .......... .......... .......... 32%  154M 2s
  1450K .......... .......... .......... .......... .......... 34%  165M 2s
  1500K .......... .......... .......... .......... .......... 35%  144M 1s
  1550K .......... .......... .......... .......... .......... 36%  457K 2s
  1600K .......... .......... .......... .......... .......... 37%  159M 1s
  1650K .......... .......... .......... .......... .......... 38%  170M 1s
  1700K .......... .......... .......... .......... .......... 39%  171M 1s
  1750K .......... .......... .......... .......... .......... 40%  453K 1s
  1800K .......... .......... .......... .......... .......... 42%  149M 1s
  1850K .......... .......... .......... .......... .......... 43%  158M 1s
  1900K .......... .......... .......... .......... .......... 44%  157M 1s
  1950K .......... .......... .......... .......... .......... 45%  136M 1s
  2000K .......... .......... .......... .......... .......... 46%  455K 1s
  2050K .......... .......... .......... .......... .......... 47% 93.7M 1s
  2100K .......... .......... .......... .......... .......... 48%  147M 1s
  2150K .......... .......... .......... .......... .......... 50%  138M 1s
  2200K .......... .......... .......... .......... .......... 51%  160M 1s
  2250K .......... .......... .......... .......... .......... 52%  146M 1s
  2300K .......... .......... .......... .......... .......... 53%  455K 1s
  2350K .......... .......... .......... .......... .......... 54%  119M 1s
  2400K .......... .......... .......... .......... .......... 55%  150M 1s
  2450K .......... .......... .......... .......... .......... 56%  150M 1s
  2500K .......... .......... .......... .......... .......... 57%  151M 1s
  2550K .......... .......... .......... .......... .......... 59%  136M 1s
  2600K .......... .......... .......... .......... .......... 60%  458K 1s
  2650K .......... .......... .......... .......... .......... 61% 97.6M 1s
  2700K .......... .......... .......... .......... .......... 62%  156M 1s
  2750K .......... .......... .......... .......... .......... 63%  131M 1s
  2800K .......... .......... .......... .......... .......... 64%  156M 1s
  2850K .......... .......... .......... .......... .......... 65%  110M 1s
  2900K .......... .......... .......... .......... .......... 67%  455K 1s
  2950K .......... .......... .......... .......... .......... 68%  136M 1s
  3000K .......... .......... .......... .......... .......... 69%  168M 1s
  3050K .......... .......... .......... .......... .......... 70%  164M 1s
  3100K .......... .......... .......... .......... .......... 71%  167M 1s
  3150K .......... .......... .......... .......... .......... 72%  460K 1s
  3200K .......... .......... .......... .......... .......... 73% 49.3M 1s
  3250K .......... .......... .......... .......... .......... 75%  155M 1s
  3300K .......... .......... .......... .......... .......... 76%  161M 0s
  3350K .......... .......... .......... .......... .......... 77%  148M 0s
  3400K .......... .......... .......... .......... .......... 78%  155M 0s
  3450K .......... .......... .......... .......... .......... 79%  461K 0s
  3500K .......... .......... .......... .......... .......... 80% 37.9M 0s
  3550K .......... .......... .......... .......... .......... 81%  122M 0s
  3600K .......... .......... .......... .......... .......... 82%  149M 0s
  3650K .......... .......... .......... .......... .......... 84%  157M 0s
  3700K .......... .......... .......... .......... .......... 85%  148M 0s
  3750K .......... .......... .......... .......... .......... 86%  457K 0s
  3800K .......... .......... .......... .......... .......... 87%  138M 0s
  3850K .......... .......... .......... .......... .......... 88%  133M 0s
  3900K .......... .......... .......... .......... .......... 89%  138M 0s
  3950K .......... .......... .......... .......... .......... 90%  121M 0s
  4000K .......... .......... .......... .......... .......... 92%  139M 0s
  4050K .......... .......... .......... .......... .......... 93%  456K 0s
  4100K .......... .......... .......... .......... .......... 94%  130M 0s
  4150K .......... .......... .......... .......... .......... 95%  132M 0s
  4200K .......... .......... .......... .......... .......... 96%  155M 0s
  4250K .......... .......... .......... .......... .......... 97%  139M 0s
  4300K .......... .......... .......... .......... .......... 98%  136M 0s
  4350K .......... .......... .......... .......... ......... 100%  457K=2.0s

2023-08-12 00:12:44 (2.15 MB/s) - ‘data/ntuple_4mu_VV.root’ saved [4505518/4505518]

1.2. Load ROOT files#

Here we simply open two ROOT files using uproot and display the branch content of one of the trees.

import numpy as np
import h5py

treename = "HZZ4LeptonsAnalysisReduced"
filename = {}
upfile = {}

filename["bkg"] = "data/ntuple_4mu_bkg.root"
filename["VV"] = "data/ntuple_4mu_VV.root"

upfile["bkg"] = uproot.open(filename["bkg"])
upfile["VV"] = uproot.open(filename["VV"])

print(upfile["bkg"][treename].show())
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
f_run                | int32_t                  | AsDtype('>i4')
f_lumi               | int32_t                  | AsDtype('>i4')
f_event              | int32_t                  | AsDtype('>i4')
f_weight             | float                    | AsDtype('>f4')
f_int_weight         | float                    | AsDtype('>f4')
f_pu_weight          | float                    | AsDtype('>f4')
f_eff_weight         | float                    | AsDtype('>f4')
f_lept1_pt           | float                    | AsDtype('>f4')
f_lept1_eta          | float                    | AsDtype('>f4')
f_lept1_phi          | float                    | AsDtype('>f4')
f_lept1_charge       | float                    | AsDtype('>f4')
f_lept1_pfx          | float                    | AsDtype('>f4')
f_lept1_sip          | float                    | AsDtype('>f4')
f_lept2_pt           | float                    | AsDtype('>f4')
f_lept2_eta          | float                    | AsDtype('>f4')
f_lept2_phi          | float                    | AsDtype('>f4')
f_lept2_charge       | float                    | AsDtype('>f4')
f_lept2_pfx          | float                    | AsDtype('>f4')
f_lept2_sip          | float                    | AsDtype('>f4')
f_lept3_pt           | float                    | AsDtype('>f4')
f_lept3_eta          | float                    | AsDtype('>f4')
f_lept3_phi          | float                    | AsDtype('>f4')
f_lept3_charge       | float                    | AsDtype('>f4')
f_lept3_pfx          | float                    | AsDtype('>f4')
f_lept3_sip          | float                    | AsDtype('>f4')
f_lept4_pt           | float                    | AsDtype('>f4')
f_lept4_eta          | float                    | AsDtype('>f4')
f_lept4_phi          | float                    | AsDtype('>f4')
f_lept4_charge       | float                    | AsDtype('>f4')
f_lept4_pfx          | float                    | AsDtype('>f4')
f_lept4_sip          | float                    | AsDtype('>f4')
f_iso_max            | float                    | AsDtype('>f4')
f_sip_max            | float                    | AsDtype('>f4')
f_Z1mass             | float                    | AsDtype('>f4')
f_Z2mass             | float                    | AsDtype('>f4')
f_angle_costhetastar | float                    | AsDtype('>f4')
f_angle_costheta1    | float                    | AsDtype('>f4')
f_angle_costheta2    | float                    | AsDtype('>f4')
f_angle_phi          | float                    | AsDtype('>f4')
f_angle_phistar1     | float                    | AsDtype('>f4')
f_pt4l               | float                    | AsDtype('>f4')
f_eta4l              | float                    | AsDtype('>f4')
f_mass4l             | float                    | AsDtype('>f4')
f_mass4lErr          | float                    | AsDtype('>f4')
f_njets_pass         | float                    | AsDtype('>f4')
f_deltajj            | float                    | AsDtype('>f4')
f_massjj             | float                    | AsDtype('>f4')
f_D_jet              | float                    | AsDtype('>f4')
f_jet1_pt            | float                    | AsDtype('>f4')
f_jet1_eta           | float                    | AsDtype('>f4')
f_jet1_phi           | float                    | AsDtype('>f4')
f_jet1_e             | float                    | AsDtype('>f4')
f_jet2_pt            | float                    | AsDtype('>f4')
f_jet2_eta           | float                    | AsDtype('>f4')
f_jet2_phi           | float                    | AsDtype('>f4')
f_jet2_e             | float                    | AsDtype('>f4')
f_D_bkg_kin          | float                    | AsDtype('>f4')
f_D_bkg              | float                    | AsDtype('>f4')
f_D_gg               | float                    | AsDtype('>f4')
f_D_g4               | float                    | AsDtype('>f4')
f_Djet_VAJHU         | float                    | AsDtype('>f4')
f_pfmet              | float                    | AsDtype('>f4')
None

1.3. Convert tree to pandas DataFrames#

In my opinion, pandas DataFrames are a more convenient/flexible data container in python: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html.

import pandas as pd

branches = ["f_mass4l", "f_massjj"]

df = {}
df["bkg"] = upfile["bkg"][treename].arrays(branches, library="pd")
df["VV"] = upfile["VV"][treename].arrays(branches, library="pd")

# print first entry
print(df["bkg"].iloc[:1])

# print shape of DataFrame
print(df["bkg"].shape)

# print first entry for f_mass4l and f_massjj
print(df["bkg"][branches].iloc[:1])

# convert back into unstructured NumPY array
print(df["bkg"].values)
print(df["bkg"].values.shape)

# get boolean mask array
mask = df["bkg"]["f_mass4l"] > 125
print(mask)

# cut using this boolean mask array
print(df["bkg"]["f_mass4l"][mask])
    f_mass4l  f_massjj
0  91.098129    -999.0
(58107, 2)
    f_mass4l  f_massjj
0  91.098129    -999.0
[[  91.09813  -999.      ]
 [ 201.84761  -999.      ]
 [  89.279076 -999.      ]
 ...
 [  90.129845 -999.      ]
 [ 250.97742  -999.      ]
 [ 229.47015  -999.      ]]
(58107, 2)
0        False
1         True
2        False
3         True
4         True
         ...  
58102    False
58103     True
58104    False
58105     True
58106     True
Name: f_mass4l, Length: 58107, dtype: bool
1        201.847610
3        586.597412
4        135.589798
5        734.903442
6        341.958466
            ...    
58097    225.355103
58098    214.074249
58103    252.845184
58105    250.977417
58106    229.470154
Name: f_mass4l, Length: 42219, dtype: float32

1.4. Plotting in matplotlib#

Finally, it is always useful to visualize the dataset before using machine learning. Here, we plot some key features in matplotlib with uproot

import matplotlib.pyplot as plt

%matplotlib inline

VARS = ["f_mass4l", "f_massjj"]

plt.figure(figsize=(5, 4), dpi=100)
plt.xlabel(VARS[0])
bins = np.linspace(80, 140, 100)
df["bkg"][VARS[0]].plot.hist(bins=bins, alpha=1, label="bkg", histtype="step")
df["VV"][VARS[0]].plot.hist(bins=bins, alpha=1, label="VV", histtype="step")
plt.legend(loc="upper right")
plt.xlim(80, 140)

plt.figure(figsize=(5, 4), dpi=100)
plt.xlabel(VARS[1])
bins = np.linspace(0, 2000, 100)
df["bkg"][VARS[1]].plot.hist(bins=bins, alpha=1, label="bkg", histtype="step")
df["VV"][VARS[1]].plot.hist(bins=bins, alpha=1, label="VV", histtype="step")
plt.legend(loc="upper right")
plt.xlim(0, 2000)

plt.show()
../_images/3eb56b6131c7bf5b2e952a0ac9a55e71f83eee979c43710199eb21f2938db8c5.png ../_images/21093abcd95b84bbbf742c8fc01a054cf069145220eb2089ea71b2fd423cf313.png