Running a CMSSW job
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How can I run CMSSW in GitLab CI?
How can avoid compiling my code for each job?
Objectives
Successfully run a test job of a simplified Z to leptons analysis
Use GitLab artifacts to pass compiled analysis code
Being able to set up CMSSW and to compile code in GitLab, and knowing how to access CMS data, the next step is to run test jobs to confirm that the code yields the expected results.
Fair use
Please remember that the provided runners are shared among all users, so please avoid massive pipelines and CI stages with more than 5 jobs in parallel or that run with a parallel configuration higher than 5.
If you need to run these pipelines please deploy your own private runners to avoid affecting the rest of the users.
Requirements for running CMSSW
In most cases, you will run your tests on centrally produced files. In order
to be able to access those, you will require a grid proxy valid for the CMS
virtual organisation (VO) as described in the previous section. For files
located on EOS, please check the section on
private information/access control
from the
Continuous Integration / Continuous Development (CI/CD)
on how to get a Kerberos token via kinit
(we won’t be using this here).
For the analysis example provided in this lessons, we’ll use a single file
from the /DYJetsToLL_M-50_HT-100to200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/MINIAODSIM data set: /store/mc/RunIIFall17MiniAODv2/DYJetsToLL_M-50_HT-100to200_TuneCP5_13TeV-madgraphMLM-pythia8/MINIAODSIM/PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/50000/E43E4210-7742-E811-9430-AC1F6B23C96A.root
.
This file is set in
ZPeakAnalysis/test/MyZPeak_cfg.py
.
Executing cmsRun
In principle, all we need to do is compile the code as demonstrated in
episode 2,
adding the grid proxy as just done in
episode 3,
and then execute the cmsRun
command. Mind that do not need the
git cms-addpkg PhysicsTools/PatExamples
command here anymore,
i.e. remove it in the following! Putting this together, the
additional commands to run would be:
cd ${CMSSW_BASE}/src/AnalysisCode/ZPeakAnalysis/
cmsRun test/MyZPeak_cfg.py
ls -l myZPeak.root
where the last command just checks that an output file has been created. However, imagine that you would like to run test jobs on more than one file and to speed things up do this in parallel. This would mean that you would have to compile the code N times, which is a waste of resources and time. Instead, we can pass the compiled code from the compile step to the run step as described below.
Using artifacts to compile code only once
Artifacts have been introduced to you as part of the Continuous Integration / Continuous Development (CI/CD) lesson. You can find more detailed information in the GitLab documentation for using artifacts.
Artifacts are write-protected
One important thing to note is that artifacts are write-protected. You cannot write into the artifact directory in any of the following steps.
For the compiled code to be available in the subsequent steps, the directories
that should be provided need to be listed explicitely. The yaml
code from
the compilation step in
episode 2
needs to be extended as follows:
artifacts:
# artifacts:untracked ignores configuration in the repository’s .gitignore file.
untracked: true
expire_in: 20 minutes
paths:
- ${CMSSW_RELEASE}
As path we use ${CMSSW_RELEASE}
, i.e. the full CMSSW area. Since this area
is write protected, we need to copy the whole area to a new directory and
recursively add write permissions again. In the following, this new workarea
will have to be used:
script:
# ...
- mkdir run
- cp -r ${CMSSW_RELEASE} run/
- chmod -R +w run/${CMSSW_RELEASE}/
- cd run/${CMSSW_RELEASE}/src
- cmsenv
Exercise: Run CMSSW using the artifact from the compile step
You should now have all required ingredients to be able to extend the
.gitlab-ci.yml
file such that you can reuse the compiled code in thecmsRun
step.
Solution: Run CMSSW using the artifact from the compile step
A possible implementation could look like this:
cmssw_run: image: name: gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cms:latest entrypoint: [""] tags: - cvmfs variables: CMS_PATH: /cvmfs/cms.cern.ch EOS_MGM_URL: "root://eoscms.cern.ch" CMSSW_RELEASE: CMSSW_10_6_8_patch1 script: - shopt -s expand_aliases - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u - mkdir run - cp -r ${CMSSW_RELEASE} run/ - chmod -R +w run/${CMSSW_RELEASE}/ - cd run/${CMSSW_RELEASE}/src - cmsenv - mkdir -p ${HOME}/.globus - printf $GRID_USERCERT | base64 -d > ${HOME}/.globus/usercert.pem - printf $GRID_USERKEY | base64 -d > ${HOME}/.globus/userkey.pem - chmod 400 ${HOME}/.globus/userkey.pem - printf ${GRID_PASSWORD} | base64 -d | voms-proxy-init --voms cms --pwstdin - cd AnalysisCode/ZPeakAnalysis/ - cmsRun test/MyZPeak_cfg.py - ls -l myZPeak.root
Bonus: Store the output ROOT file as artifact
It could be useful to store the output ROOT file as an artifact so that you simply download it after job completion. Do you know how to do it? Hint: you need to provide the full path to it.
Key Points
A special CMSSW image is required to successfully run CMSSW jobs
Running on CMS data requires a grid proxy
The use of artifacts allows passing results of one step to the other
Since artifacts are write-protected, the directory needs to be copied before running CMSSW