This lesson is in the early stages of development (Alpha version)

HLT timing studies

Overview

Teaching: 30 min
Exercises: 0 min
Questions
Objectives
  • Measure the time it takes to run an HLT menu

  • Reference repo: link

Prerequisites

A CERN account with access to lxplus - that’s it!

Instructions

Exercise 1: CPU/GPU Timing measurements without creation of a CMSSW environment

  1. Log in to lxplus and clone the timing repository somewhere (e.g. in your EOS space)

     git clone https://gitlab.cern.ch/cms-tsg/steam/timing.git
     cd timing
    
  2. Submit a timing job to the timing machine using CMSSW_14_0_11, the GRun menu V173 and the default dataset on the timing machine.

     python3 submit.py /dev/CMSSW_14_0_0/GRun/V173 --cmssw CMSSW_14_0_11 --tag YOUR_TAG_HERE
    

    If you have 2FA, there is a possibility that lxplus9 will require pip3 install tsgauth==0.10.2. Which can be fixed using the lines bellow.

     python3 -m venv venv
     source venv/bin/activate
     pip3 install --upgrade pip
     pip3 install tsgauth==0.10.2
    

    The lxplus9 also will give you a link (like https://auth.cern.ch/auth/realms/cern/...) which should be copy and pasted in the browser to grant access.

  3. Check the status of your job using the job_manager.py script.

     python3 job_manager.py
    

    It takes around 20-30 minutes to run.

  4. Re-submit your job using the –rerun option, followed by the job ID of the first submitted job. This will re-submit the first job with the exact same parameters and can be useful if you want to re-run a job multiple times to get an idea of the variance of the timing measurements. This also leads to the program re-using the same CMSSW area as before on the timing machine, so it saves up some disk space there. Also make sure to add a new --tag to your job so you can distinguish the two in the job queue

     python3 submit.py --rerun JOB_ID --tag YOUR_NEW_TAG_HERE
    
  5. Remove the recently added job from the queue using the job_manager.py script and the --rm option.

     python3 job_manager.py --rm JOB_ID_OF_RESUBMITTED_JOB
    

    NOTE: It is currently not possible to cancel an already running job. Only queued jobs can be cancelled.

  6. Submit another job with the same settings as before, but now only using the CPUs by adding the --cpu-only option. This will run the same job, but only on the CPUs of the timing machine. This is useful to compare the performance of the CPUs and GPUs.

     python3 submit.py /dev/CMSSW_14_0_0/GRun/V173 --cmssw CMSSW_14_0_11 --cpu-only --tag YOUR_CPU_JOB_TAG_HERE
    
  7. Once your jobs have finished, get the reports for one of the jobs using the job_manager.py script and the --report option.

     python3 job_manager.py --report JOB_ID
    

    This will download a tar.gz file containing output and error files for :

    • the creation of the CMSSW environment on the timing machine (i.e. the scram proj/cmsrel step)
    • the building of the CMSSW environment including the merging of provided pull requests etc. (i.e. the scram b step)
    • the benchmarking script (found in the run_benchmark.py file) which initializes and controls the settings of the actual timing measurements on multiple CPUs/GPUs using the patatrack-scripts` repo
    • the final cmsRun of the timing job

    NOTE: All of these steps are taken care of by the server for you as a user and the reports are just helpful tools when occasionally some measurement is crashing.

  8. Investigate the results using the timing GUI

    • Use the timing GUI to compare your CPU-only job with your GPU job. The timing GUI can be accessed from this Link. Once you arrive at the GUI, klick the “open” button and find your CERN username, then klick the little arrow/triangle next to the box to get a drop-down list which contains all your finished measurements. Click the boxes on the GPU and the CPU ones and hit the “OK” button.
    • The results take some time to load. Once you see your results, klick on the “TIMING” tab to see the different timing distributions. You should be able to identify the fast peaks and tracking/PF tails that were discussed in the slides.
    • In the lower panel, you see the timing of the individual paths of the menu. Klick on one of the job ID names to sort the path timing in ascending/descending order for that respective measurement. Investigate which paths are the fastest and the most time-consuming ones and if it checks out with what we discussed in the slides. You can also klick on a path/row to see the timing distribution of that path (left panel) and the average timing of each module inside that path (right panel). For the right panel, it’s sometimes difficult to see which module you’re looking at, so you can also hover over the “bins” and some information for that bin will pop up, including the module name.
    • Finally, let’s find the path with the largest timing difference between CPU and GPU measurements. To do this, switch on the “Display Diff” button in the top right of the lower panel. This will cause the column of the second measurement to show the timing difference with the column of the first measurement. Note that this works also when more than two measurements are selected for comparison: The first selected measurement column is always the “reference” and other columns show the difference relative to it. Now you can sort ascending/descending by timing difference by clicking the job name of the second measurement. Which paths have the largest timing difference? Is this what you would expect? You can also look at the timing distributions of the paths again by clicking on the path name again.

Exercise 2 (Bonus): CPU/GPU Timing measurements with creation of CMSSW area

It is also possible to submit a timing job with a previously created CMSSW area. When you do this, your area will get “cloned” to the timing machine and it will run a measurement using the cloned area. Since the arealess submission already allows for a lot of flexibility, this is not really necessary and the arealess submission is the recommended way to submit jobs. However, if you run some very specialized expert workflows that for some reason require additional tinkering with the CMSSW area that is not covered by the timing code, it can sometimes still be useful.

  1. Create a CMSSW area on lxplus and build it using the GRun menu V173 and the default dataset on the timing machine.

     cmsrel CMSSW_14_0_11
     cd CMSSW_14_0_11/src
     cmsenv
     git cms-init
     scram build -j 4
    
  2. Download the same GRun menu as in Exercise 1 using the hltGetConfiguration command. When you do this outside this exercise, make sure you choose the correct globaltag and l1 menu for your use case. For this exercise you can copy/paste the command below.

     hltGetConfiguration /dev/CMSSW_14_0_0/GRun/V173 --globaltag 140X_dataRun3_HLT_v3 --data --process TIMING --full --offline --output minimal --type GRun --max-events 20000 --era Run3 --timing --l1 L1Menu_Collisions2024_v1_3_0-d1_xml > hlt.py
    
  3. Clone the timing repository.

     git clone https://gitlab.cern.ch/cms-tsg/steam/timing.git
    
  4. Submit your config to the timing server. Many of the options that we needed in the arealess submission are not needed anymore, since they are already specified in the config.

     python3 ./timing/submit.py hlt.py --tag YOUR_NEW_TAG_HERE
    
  5. Again you can investigate the job using the job_manger.py script.

     python3 ./timing/job_manager.py
    
  6. Once the job is done, you can investigate the results in the timing GUI. For example, you can check if your result of the arealess submission coincides with your result of the submission with an area (a variance of about 1-3% is normal).

Key Points