GitLab CI for CMS

Setting up a CMSSW environment

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • Which GitLab runners are needed?

  • What’s different w.r.t. LXPLUS?

Objectives
  • Know how to source the CMSSW environment

  • Understand the different commands that need to be used

Before getting into details, a few links to useful documentation on GitLab CI/CD and also CERN-specific information:

These pages serve as a good entrypoint in case of problems and questions.

Create a new GitLab project to follow along

Please create a new GitLab project now to follow along. You can for instance call it awesome-gitlab-cms. In the following, we will assume that all your work is in a directory called awesome-workshop in your home directory and the repository resides therein: ~/awesome-workshop/awesome-gitlab-cms

The commands would look like this (replace ${USER} by your CERN username in case it isn’t the same as on your laptop):

mkdir -p ~/awesome-workshop
git clone ssh://git@gitlab.cern.ch:7999/${USER}/awesome-gitlab-cms.git

Choosing the correct GitLab runner

Standard GitLab runners at CERN do not mount CVMFS, which is required for setting up CMSSW. In order to get a runner that mounts CVMFS, all you need to do is add a tag to your gitlab-ci.yml file:

tags:
  - cvmfs

A minimal .gitlab-ci.yml file to get a runner with CVMFS looks like the following:

cmssw_setup:
  tags:
    - cvmfs
  script:
    - ls /cvmfs/cms.cern.ch/

The cmssw_setup line defines the name of the job, and all the job does is list /cvmfs/cms.cern.ch/, which would fail if CVMFS isn’t mounted. In the GitLab UI one can see the output, and also the cvmfs label:

A job with a GitLab CVMFS Runner showing the cvmfs label

Setting up CMSSW

CMS-specific setup

Since the default user in the runner is not your username and the container doesn’t know anything about you in the first place, it doesn’t have any CMS-related environment as people registered as CMS members (via the zh group on LXPLUS). This means that everything needs to be set up manually.

To set up a CMSSW release (here CMSSW_10_6_8_patch1), you would usually run the following commands:

source /cvmfs/cms.cern.ch/cmsset_default.sh
cmsrel CMSSW_10_6_8_patch1
cd CMSSW_10_6_8_patch1/src
cmsenv

Maybe the second command will print out a warning such as

WARNING: Developer's area is created for non-production architecture slc7_amd64_gcc820. Production architecture for this release is slc7_amd64_gcc700.

which can be ignored in this case (or could be removed by first executing export SCRAM_ARCH=slc7_amd64_gcc700).

The command source /cvmfs/cms.cern.ch/cmsset_default.sh sets several environment variables, in particular adding /cvmfs/cms.cern.ch/common to the ${PATH}. You can check this by running echo ${PATH}. Another effect of this command is that several aliases are defined, which means that executing the alias command effectively executes the original command.

Printing all set aliases

To print all aliases that are set, just run alias.

What are the actual commands behind cmsenv and cmsrel?

The most important aliases are in the table below:

Alias Command
cmsenv eval `scramv1 runtime -sh`
cmsrel scramv1 project CMSSW

The meaning of eval: The args are read and concatenated together into a single command. This command is then read and executed by the shell, and its exit status is returned as the value of eval. If there are no args, or only null arguments, eval returns 0.

Knowing that a command is an alias is important, since bash does not automatically expand aliases when running non-interactively, which is the case when running in GitLab.

In order to make aliases work in the GitLab runners, one needs to explicitely enable alias expansion:

shopt -s expand_aliases

Another common pitfall when setting up CMSSW in GitLab is that the execution fails because the setup script doesn’t follow best practives for shell scripts such as returning non-zero return values even if the setup is OK or using unset variables. Even if the script exits without visible error message, there could be something wrong. It is therefore often a good idea to circumvent issues like that by disabling strict checks before running the setup command and enabling these checks afterwards again.

Exercise: Set up CMSSW in GitLab

Knowing all this, can you write the yaml to set up CMSSW in GitLab starting from the fragment above and check if this is all working by executing cmsRun --help at the end?

Solution: Set up CMSSW in GitLab

Here is a possible solution:

cmssw_setup:
  tags:
    - cvmfs
  variables:
    # This is also set on LXPLUS
    CMS_PATH: /cvmfs/cms.cern.ch
  script:
    # IMPORTANT: Expand aliases in noninteractive bash mode
    # Otherwise cmsrel and cmsenv won't work
    - shopt -s expand_aliases
    # access CVMFS
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - cmsrel CMSSW_10_6_8_patch1
    - cd CMSSW_10_6_8_patch1/src
    - cmsenv
    - cmsRun --help

The set +u command turns off errors for referencing unset variables. It isn’t really needed here, since -u (i.e. not allowing to use unset variables) isn’t set by default, but the script would fail if one used set -u somewhere else, so it’s safer to catch this here.

The reason why in the example above the variable ${CMS_PATH} is used and not simply /cvmfs/cms.cern.ch directly is just to mimick the default environment you would get on LXPLUS. You can check if this is the case for you as well by running env | grep CMS_PATH after logging on to LXPLUS.

You can see some examples in the payload GitLab repository for this lesson.

Key Points

  • GitLab CVMFS runners are required to use CMSSW.

  • The setup script sets aliases, which are not expanded by default.

  • If the setup script tries to access unset variables, then that can cause the CI to fail when using strict shell scripting checks.


Compiling a CMSSW package

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • How can I compile my CMSSW package using GitLab CI?

  • How do I add other CMSSW packages?

Objectives
  • Successfully compile CMSSW example analysis code in GitLab CI

Now that you know how to get a CMSSW environment, it is time to do something useful with it.

Compiling code within the repository

For your analysis to be compiled with CMSSW, it needs to reside in the workarea’s src directory, and in there follow the directory structure of two subdirectories (e.g. AnalysisCode/MyAnalysis) within which there can be src, interface, plugin and further directories. Your analysis code (under version control in GitLab/GitHub) will usually not contain the CMSSW workarea. The git repository will either contain the analysis code at the lowest level or could be collected in a subdirectory to disentangle it from your configuration files such as the .gitlab-ci.yml file.

We will use an example analysis, which selects pairs of electrons and muons. Download the zip file containing the analysis and extract it now. The analysis code is in a directory called ZPeakAnalysis within which plugins (the C++ code) and test (the python config) directories reside. Add this directory to your repository:

# unzip ZPeakAnalysis.zip
# mv ZPeakAnalysis ~/awesome-workshop/awesome-gitlab-cms/
git add ZPeakAnalysis
git commit -m "Add ZPeakAnalysis"

When trying to compile the code in GitLab, the ZPeakAnalysis needs to be copied into the CMSSW workarea, and it’s advisable to use environment variables for this purpose. This would be achieved like this:

mkdir ${CMSSW_BASE}/src/AnalysisCode
cp -r "${CI_PROJECT_DIR}/ZPeakAnalysis" "${CMSSW_BASE}/src/AnalysisCode/"

With these two commands we will now be able to extend the .gitlab-ci.yml file such that we can compile our analysis code in GitLab. To improve the readability of the file, the CMSSW_RELEASE is defined as a variable:

cmssw_compile:
  tags:
    - cvmfs
  variables:
    CMS_PATH: /cvmfs/cms.cern.ch
    CMSSW_RELEASE: CMSSW_10_6_8_patch1
  script:
    - shopt -s expand_aliases
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - cmsrel ${CMSSW_RELEASE}
    - cd ${CMSSW_RELEASE}/src
    - cmsenv
    - mkdir -p AnalysisCode
    - cp -r "${CI_PROJECT_DIR}/ZPeakAnalysis" "${CMSSW_BASE}/src/AnalysisCode/"
    - scram b

Exercise: Test that compilation works

Copy the files from https://gitlab.cern.ch/awesome-workshop/payload-gitlab-cms/tree/master/ZPeakAnalysis to your repository and confirm that the code compiles by checking that the GitLab Job succeeds.

Adding CMSSW packages

Always add CMSSW packages before compiling analysis code!

Adding CMSSW packages has to happen before compiling analysis code in the repository, since git cms-addpkg will call git cms-init for the $CMSSW_BASE/src directory, and git init doesn’t work if the directory already contains files.

Assuming that you would like to check out CMSSW packages using the commands described in the CMSSW FAQ, a couple of additional settings need to be applied. For instance, try running the following command in GitLab CI after having set up CMSSW:

git cms-addpkg PhysicsTools/PatExamples

This will fail:

Cannot find your details in the git configuration.
Please set up your full name via:
    git config --global user.name '<your name> <your last name>'
Please set up your email via:
    git config --global user.email '<your e-mail>'
Please set up your GitHub user name via:
    git config --global user.github <your github username>

There are a couple of options to make things work:

For simplicity, and since we do not need to commit anything back to CMSSW from GitLab, we will use the latter approach. A complete yaml fragment that checks out a CMSSW package after having set up CMSSW and then compiles the code looks as follows:

cmssw_addpkg:
  stage: compile
  tags:
    - cvmfs
  variables:
    CMS_PATH: /cvmfs/cms.cern.ch
    CMSSW_RELEASE: CMSSW_10_6_8_patch1
  script:
    - shopt -s expand_aliases
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - cmsrel ${CMSSW_RELEASE}
    - cd ${CMSSW_RELEASE}/src
    - cmsenv
    # If within CERN, we can speed up interaction with CMSSW:
    - export CMSSW_MIRROR=https://:@git.cern.ch/kerberos/CMSSW.git
    # This is another trick to speed things up independent of your location:
    - export CMSSW_GIT_REFERENCE=/cvmfs/cms.cern.ch/cmssw.git.daily
    # Important: run git cms-init with --upstream-only flag to not run into
    # problems with git config
    - git cms-init --upstream-only
    - git cms-addpkg PhysicsTools/PatExamples
    - scram b

The additional two variables that are exported here, CMSSW_MIRROR and CMSSW_GIT_REFERENCE can speed up interaction with git, in particular faster package checkouts. Mind that CMSSW_MIRROR is specific to when developing within the CERN network. Settings these variables is not mandatory.

Key Points

  • For code to be compiled in CMSSW, it needs to reside within the work area’s src directory.

  • The code needs to be copied manually using the CI script.

  • When using commands such as git cms-addpkg, the git configuration needs to be adjusted/set first.


Obtaining a grid proxy

Overview

Teaching: 10 min
Exercises: 15 min
Questions
  • How can I obtain a grid proxy in GitLab?

Objectives
  • Securely add grid proxy certificates and passwords to GitLab

  • Successfully obtain a grid proxy for the CMS VO

Securely adding passwords and files to GitLab

When trying to access CMS data, a grid, or often also referred to as Virtual Organization Membership Service (VOMS) proxy is needed in most cases. In order to be able to obtain this proxy, your userkey.pem and usercert.pem files, which by default will reside in the ~/.globus directory, will need to be stored in GitLab.

Keep your secrets secret!

Please be extra careful when it comes to your account and grid passwords as well as your certificates! They should never be put in any public place. Putting them under version control is risky, since even if you delete them from the HEAD of your master branch, they will still be in the commit history. Furthermore, putting them in a public, or even a private but shared repository, is a violation of grid policy, and could lead to access being revoked for the offending user. Should you accidentally have put sensitive data to a repository, please see the guide by GitHub to remove them (though the data should still be considered compromised).

Please make sure to revisit the section on private information/access control from the Continuous Integration / Continuous Development (CI/CD) on how to add variables in GitLab CI/CD in general. From that lesson you will know how to add e.g. your grid proxy password. The grid certificate itself, however, consists of two files that look like this:

cat ~/.globus/usercert.pem
Bag Attributes
    localKeyID: 95 A0 95 B0 1e AB BD 13 59 D1 D2 BB 35 5A EA 2E CD 47 BA F7
subject=/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=username/CN=123456/CN=Anonymous Nonamious
issuer=/DC=ch/DC=cern/CN=CERN Grid Certification Authority
-----BEGIN CERTIFICATE-----
TH1s1SNT4R34lGr1DC3rt1f1C4t3But1Th4s4l3NgtH0F64CH4r4ct3rSP3rL1N3
1amT00La2YT0wR1T345m0r3l1N3S0fn0ns3NS3S01/lLSt0pH3r3AndADdsPAc3S
...45 more lines of l33t dialect...
+4nd+heL4S+38cH4r4c+ersBef0rE+HE1+enDs==
-----END CERTIFICATE-----

We need more base: base64

Simply pasting them into GitLab does not work since the line breaks will not be reflected correctly. There is a trick we can play though: we can encode the files including line breaks so that they are simply a string, which we can decode to yield the same result as the input. The tool of our choice is base64. Let’s give this a go.

Exercise: Encode using base64

Copy the output of the cat ~/.globus/usercert.pem output above into a text file called testcert.txt, and pipe the content of this file to the base64 command or use it as input file directly (hint: base64 --help).

Solution: Encode using base64

The command should be (when piping):

cat testcert.txt | base64

or (when using the input file directly - this is better):

base64 -i testcert.txt

and the output will then be the following:

QmFnIEF0dHJpYnV0ZXMKICAgIGxvY2FsS2V5SUQ6IDk1IEEwIDk1IEIwIDFlIEFCIEJEIDEzIDU5IEQxIEQyIEJCIDM1IDVBIEVBIDJFIENEIDQ3IEJBIEY3CnN1YmplY3Q9L0RDPWNoL0RDPWNlcm4vT1U9T3JnYW5pYyBVbml0cy9PVT1Vc2Vycy9DTj11c2VybmFtZS9DTj0xMjM0NTYvQ049QW5vbnltb3VzIE5vbmFtaW91cwppc3N1ZXI9L0RDPWNoL0RDPWNlcm4vQ049Q0VSTiBHcmlkIENlcnRpZmljYXRpb24gQXV0aG9yaXR5Ci0tLS0tQkVHSU4gQ0VSVElGSUNBVEUtLS0tLQpUSDFzMVNOVDRSMzRsR3IxREMzcnQxZjFDNHQzQnV0MVRoNHM0bDNOZ3RIMEY2NENINHI0Y3QzclNQM3JMMU4zCjFhbVQwMExhMllUMHdSMVQzNDVtMHIzbDFOM1MwZm4wbnMzTlMzUzAxL2xMU3QwcEgzcjNBbmRBRGRzUEFjM1MKLi4uNDUgbW9yZSBsaW5lcyBvZiBsMzN0IGRpYWxlY3QuLi4KKzRuZCtoZUw0UyszOGNINHI0YytlcnNCZWYwckUrSEUxK2VuRHM9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==

Decoding works by adding the -d (Linux) or -D (MacOS) flag to the base64 command. You can verify that this works by directly decoding again as follows:

base64 -i testcert.txt | base64 -d

which should give you the pseudo-certificate from above. Have a go at the exercise below and try to decode the secret phrase:

Exercise: Decode using base64

Decode the following string using the base64 command: SSB3aWxsIG5ldmVyIHB1dCBteSBzZWNyZXRzIHVuZGVyIHZlcnNpb24gY29udHJvbAo=

Solution: Decode using base64

The command should be (mind the capitalisation of the -D/-d flag):

echo "SSB3aWxsIG5ldmVyIHB1dCBteSBzZWNyZXRzIHVuZGVyIHZlcnNpb24gY29udHJvbAo=" | base64 -D

and the output should be the following:

I will never put my secrets under version control

Adding grid certificate and password to GitLab

There are a couple of important things to keep in mind when adding passwords and certificates as variables to GitLab:

For more details, see the Variables: Advanced use section of the GitLab documentation. Setting variables to Protected means that they are only available in protected branches, e.g. your master branch. This is important when collaborating with others, since anyone with access could just echo the variables when making a merge request if you run automated tests on merge requests.

We will add the following three variables:

For safety and to avoid issues with special characters, you should not simply add your grid proxy password in GitLab, but always encode it using base64. For your password do the following (make sure nobody’s peeking at your screen):

printf 'mySecr3tP4$$w0rd' | base64

Mind the single quotes (') and not double quotes ("). If you are on Linux, you should -w 0 to the base64 command. For the two certificates, use them as input to base64 directly:

base64 -i ~/.globus/usercert.pem
base64 -i cat ~/.globus/userkey.pem

and copy the output into GitLab.

Every equal sign counts!

Make sure to copy the full string including the trailing equal signs.

The Settings –> CI / CD –> Variables section should look like this:

CI/CD Variables section with grid secrets added

Better safe than sorry

To reduce the risk of leaking your passwords and certificates to others, you should protect your master branch, effectively preventing you and others from pushing to it directly and e.g. print your password to the job logs. To do so, go to Settings -> Repository -> Protected Branches. Mind that the option chosen below still puts a lot of trust in your collaborators. With the Protected option chosen above for the variables, the variables are then only available to those branches (but still allow Maintainers to push to them):

Protecting branches to prevent password leaks

Using the grid proxy

With the grid secrets stored, we can now make use of them. We need to first restore the grid certificate files in the ~/.globus directly, then run the voms-proxy command and pass the grid proxy password to it. This is done as follows:

mkdir -p ${HOME}/.globus
printf "${GRID_USERCERT}" | base64 -d > ${HOME}/.globus/usercert.pem
printf "${GRID_USERKEY}" | base64 -d > ${HOME}/.globus/userkey.pem
chmod 400 ${HOME}/.globus/userkey.pem
printf "${GRID_PASSWORD}" | base64 -d | voms-proxy-init --voms cms --pwstdin

Trying this with the standard GitLab CC7 runner will fail, since the CMS-specific certificates are not included in the image. An image that has these certificates installed already is gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cms:latest. An example to obtain a grid proxy, check it, and then destroy it again would result in the following yaml:

voms_proxy:
  image:
    name: gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cms:latest
    entrypoint: [""]
  script:
    - mkdir -p ${HOME}/.globus
    - printf "${GRID_USERCERT}" | base64 -d > ${HOME}/.globus/usercert.pem
    - printf "${GRID_USERKEY}" | base64 -d > ${HOME}/.globus/userkey.pem
    - chmod 400 ${HOME}/.globus/userkey.pem
    - printf "${GRID_PASSWORD}" | base64 -d | voms-proxy-init --voms cms --pwstdin
    - voms-proxy-info --all
    - voms-proxy-destroy

You could take this further by e.g. performing a DAS query to keep your input files up-to-date.

Confirm that this works for you before moving on to the next section! In case of problems, you might need to add -w 0 to the base64 -d command.

Key Points

  • Special care is needed when adding secrets in GitLab

  • Passwords and certificates should always be set to Protected state

  • Certificates need to be base64-encoded for use as secrets


Running a CMSSW job

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How can I run CMSSW in GitLab CI?

  • How can avoid compiling my code for each job?

Objectives
  • Successfully run a test job of a simplified Z to leptons analysis

  • Use GitLab artifacts to pass compiled analysis code

Being able to set up CMSSW and to compile code in GitLab, and knowing how to access CMS data, the next step is to run test jobs to confirm that the code yields the expected results.

Fair use

Please remember that the provided runners are shared among all users, so please avoid massive pipelines and CI stages with more than 5 jobs in parallel or that run with a parallel configuration higher than 5.

If you need to run these pipelines please deploy your own private runners to avoid affecting the rest of the users.

Requirements for running CMSSW

In most cases, you will run your tests on centrally produced files. In order to be able to access those, you will require a grid proxy valid for the CMS virtual organisation (VO) as described in the previous section. For files located on EOS, please check the section on private information/access control from the Continuous Integration / Continuous Development (CI/CD) on how to get a Kerberos token via kinit (we won’t be using this here).

For the analysis example provided in this lessons, we’ll use a single file from the /DYJetsToLL_M-50_HT-100to200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/MINIAODSIM data set: /store/mc/RunIIFall17MiniAODv2/DYJetsToLL_M-50_HT-100to200_TuneCP5_13TeV-madgraphMLM-pythia8/MINIAODSIM/PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/50000/E43E4210-7742-E811-9430-AC1F6B23C96A.root. This file is set in ZPeakAnalysis/test/MyZPeak_cfg.py.

Executing cmsRun

In principle, all we need to do is compile the code as demonstrated in episode 2, adding the grid proxy as just done in episode 3, and then execute the cmsRun command. Mind that do not need the git cms-addpkg PhysicsTools/PatExamples command here anymore, i.e. remove it in the following! Putting this together, the additional commands to run would be:

cd ${CMSSW_BASE}/src/AnalysisCode/ZPeakAnalysis/
cmsRun test/MyZPeak_cfg.py
ls -l myZPeak.root

where the last command just checks that an output file has been created. However, imagine that you would like to run test jobs on more than one file and to speed things up do this in parallel. This would mean that you would have to compile the code N times, which is a waste of resources and time. Instead, we can pass the compiled code from the compile step to the run step as described below.

Using artifacts to compile code only once

Artifacts have been introduced to you as part of the Continuous Integration / Continuous Development (CI/CD) lesson. You can find more detailed information in the GitLab documentation for using artifacts.

Artifacts are write-protected

One important thing to note is that artifacts are write-protected. You cannot write into the artifact directory in any of the following steps.

For the compiled code to be available in the subsequent steps, the directories that should be provided need to be listed explicitely. The yaml code from the compilation step in episode 2 needs to be extended as follows:

artifacts:
  # artifacts:untracked ignores configuration in the repository’s .gitignore file.
  untracked: true
  expire_in: 20 minutes
  paths:
    - ${CMSSW_RELEASE}

As path we use ${CMSSW_RELEASE}, i.e. the full CMSSW area. Since this area is write protected, we need to copy the whole area to a new directory and recursively add write permissions again. In the following, this new workarea will have to be used:

script:
  # ...
  - mkdir run
  - cp -r ${CMSSW_RELEASE} run/
  - chmod -R +w run/${CMSSW_RELEASE}/
  - cd run/${CMSSW_RELEASE}/src
  - cmsenv

Exercise: Run CMSSW using the artifact from the compile step

You should now have all required ingredients to be able to extend the .gitlab-ci.yml file such that you can reuse the compiled code in the cmsRun step.

Solution: Run CMSSW using the artifact from the compile step

A possible implementation could look like this:

cmssw_run:
  image:
    name: gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cms:latest
    entrypoint: [""]
  tags:
    - cvmfs
  variables:
    CMS_PATH: /cvmfs/cms.cern.ch
    EOS_MGM_URL: "root://eoscms.cern.ch"
    CMSSW_RELEASE: CMSSW_10_6_8_patch1
  script:
    - shopt -s expand_aliases
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - mkdir run
    - cp -r ${CMSSW_RELEASE} run/
    - chmod -R +w run/${CMSSW_RELEASE}/
    - cd run/${CMSSW_RELEASE}/src
    - cmsenv
    - mkdir -p ${HOME}/.globus
    - printf $GRID_USERCERT | base64 -d > ${HOME}/.globus/usercert.pem
    - printf $GRID_USERKEY | base64 -d > ${HOME}/.globus/userkey.pem
    - chmod 400 ${HOME}/.globus/userkey.pem
    - printf ${GRID_PASSWORD} | base64 -d | voms-proxy-init --voms cms --pwstdin
    - cd AnalysisCode/ZPeakAnalysis/
    - cmsRun test/MyZPeak_cfg.py
    - ls -l myZPeak.root

Bonus: Store the output ROOT file as artifact

It could be useful to store the output ROOT file as an artifact so that you simply download it after job completion. Do you know how to do it? Hint: you need to provide the full path to it.

Key Points

  • A special CMSSW image is required to successfully run CMSSW jobs

  • Running on CMS data requires a grid proxy

  • The use of artifacts allows passing results of one step to the other

  • Since artifacts are write-protected, the directory needs to be copied before running CMSSW