HLT timing studies
Overview
Teaching: 30 min
Exercises: 0 minQuestions
Objectives
Measure the time it takes to run an HLT menu
Reference repo: link
Prerequisites
A CERN account with access to lxplus - that’s it!
Instructions
Exercise 1: CPU/GPU Timing measurements without creation of a CMSSW environment
-
Log in to lxplus and clone the timing repository somewhere (e.g. in your EOS space)
git clone https://gitlab.cern.ch/cms-tsg/steam/timing.git cd timing
-
Submit a timing job to the timing machine using CMSSW_14_0_11, the GRun menu V173 and the default dataset on the timing machine.
python3 submit.py /dev/CMSSW_14_0_0/GRun/V173 --cmssw CMSSW_14_0_11 --tag YOUR_TAG_HERE
If you have 2FA, there is a possibility that lxplus9 will require
pip3 install tsgauth==0.10.2
. Which can be fixed using the lines bellow.python3 -m venv venv source venv/bin/activate pip3 install --upgrade pip pip3 install tsgauth==0.10.2
The lxplus9 also will give you a link (like
https://auth.cern.ch/auth/realms/cern/...
) which should be copy and pasted in the browser to grant access. -
Check the status of your job using the
job_manager.py
script.python3 job_manager.py
It takes around 20-30 minutes to run.
-
Re-submit your job using the –rerun option, followed by the job ID of the first submitted job. This will re-submit the first job with the exact same parameters and can be useful if you want to re-run a job multiple times to get an idea of the variance of the timing measurements. This also leads to the program re-using the same CMSSW area as before on the timing machine, so it saves up some disk space there. Also make sure to add a new
--tag
to your job so you can distinguish the two in the job queuepython3 submit.py --rerun JOB_ID --tag YOUR_NEW_TAG_HERE
-
Remove the recently added job from the queue using the
job_manager.py
script and the--rm
option.python3 job_manager.py --rm JOB_ID_OF_RESUBMITTED_JOB
NOTE: It is currently not possible to cancel an already running job. Only queued jobs can be cancelled.
-
Submit another job with the same settings as before, but now only using the CPUs by adding the
--cpu-only
option. This will run the same job, but only on the CPUs of the timing machine. This is useful to compare the performance of the CPUs and GPUs.python3 submit.py /dev/CMSSW_14_0_0/GRun/V173 --cmssw CMSSW_14_0_11 --cpu-only --tag YOUR_CPU_JOB_TAG_HERE
-
Once your jobs have finished, get the reports for one of the jobs using the
job_manager.py
script and the--report
option.python3 job_manager.py --report JOB_ID
This will download a
tar.gz
file containing output and error files for :- the creation of the CMSSW environment on the timing machine (i.e. the
scram proj
/cmsrel
step) - the building of the CMSSW environment including the merging of provided pull requests etc. (i.e. the
scram b
step) - the benchmarking script (found in the
run_benchmark.py
file) which initializes and controls the settings of the actual timing measurements on multiple CPUs/GPUs using thepatatrack-scripts
` repo - the final
cmsRun
of the timing job
NOTE: All of these steps are taken care of by the server for you as a user and the reports are just helpful tools when occasionally some measurement is crashing.
- the creation of the CMSSW environment on the timing machine (i.e. the
-
Investigate the results using the timing GUI
- Use the timing GUI to compare your CPU-only job with your GPU job. The timing GUI can be accessed from this Link. Once you arrive at the GUI, klick the “open” button and find your CERN username, then klick the little arrow/triangle next to the box to get a drop-down list which contains all your finished measurements. Click the boxes on the GPU and the CPU ones and hit the “OK” button.
- The results take some time to load. Once you see your results, klick on the “TIMING” tab to see the different timing distributions. You should be able to identify the fast peaks and tracking/PF tails that were discussed in the slides.
- In the lower panel, you see the timing of the individual paths of the menu. Klick on one of the job ID names to sort the path timing in ascending/descending order for that respective measurement. Investigate which paths are the fastest and the most time-consuming ones and if it checks out with what we discussed in the slides. You can also klick on a path/row to see the timing distribution of that path (left panel) and the average timing of each module inside that path (right panel). For the right panel, it’s sometimes difficult to see which module you’re looking at, so you can also hover over the “bins” and some information for that bin will pop up, including the module name.
- Finally, let’s find the path with the largest timing difference between CPU and GPU measurements. To do this, switch on the “Display Diff” button in the top right of the lower panel. This will cause the column of the second measurement to show the timing difference with the column of the first measurement. Note that this works also when more than two measurements are selected for comparison: The first selected measurement column is always the “reference” and other columns show the difference relative to it. Now you can sort ascending/descending by timing difference by clicking the job name of the second measurement. Which paths have the largest timing difference? Is this what you would expect? You can also look at the timing distributions of the paths again by clicking on the path name again.
Exercise 2 (Bonus): CPU/GPU Timing measurements with creation of CMSSW area
It is also possible to submit a timing job with a previously created CMSSW area. When you do this, your area will get “cloned” to the timing machine and it will run a measurement using the cloned area. Since the arealess submission already allows for a lot of flexibility, this is not really necessary and the arealess submission is the recommended way to submit jobs. However, if you run some very specialized expert workflows that for some reason require additional tinkering with the CMSSW area that is not covered by the timing code, it can sometimes still be useful.
-
Create a CMSSW area on lxplus and build it using the GRun menu V173 and the default dataset on the timing machine.
cmsrel CMSSW_14_0_11 cd CMSSW_14_0_11/src cmsenv git cms-init scram build -j 4
-
Download the same GRun menu as in Exercise 1 using the
hltGetConfiguration
command. When you do this outside this exercise, make sure you choose the correct globaltag and l1 menu for your use case. For this exercise you can copy/paste the command below.hltGetConfiguration /dev/CMSSW_14_0_0/GRun/V173 --globaltag 140X_dataRun3_HLT_v3 --data --process TIMING --full --offline --output minimal --type GRun --max-events 20000 --era Run3 --timing --l1 L1Menu_Collisions2024_v1_3_0-d1_xml > hlt.py
-
Clone the timing repository.
git clone https://gitlab.cern.ch/cms-tsg/steam/timing.git
-
Submit your config to the timing server. Many of the options that we needed in the arealess submission are not needed anymore, since they are already specified in the config.
python3 ./timing/submit.py hlt.py --tag YOUR_NEW_TAG_HERE
-
Again you can investigate the job using the
job_manger.py
script.python3 ./timing/job_manager.py
-
Once the job is done, you can investigate the results in the timing GUI. For example, you can check if your result of the arealess submission coincides with your result of the submission with an area (a variance of about 1-3% is normal).
Useful links
Key Points