Setting up a CMSSW environment
Overview
Teaching: 10 min
Exercises: 10 minQuestions
Which GitLab runners are needed?
What’s different w.r.t. LXPLUS?
Objectives
Know how to source the CMSSW environment
Understand the different commands that need to be used
Before getting into details, a few links to useful documentation on GitLab CI/CD and also CERN-specific information:
These pages serve as a good entrypoint in case of problems and questions.
Create a new GitLab project to follow along
Please create a new GitLab project now to follow along. You can for instance call it
awesome-gitlab-cms
. In the following, we will assume that all your work is in a directory calledawesome-workshop
in your home directory and the repository resides therein:~/awesome-workshop/awesome-gitlab-cms
The commands would look like this (replace ${USER}
by your CERN
username in case it isn’t the same as on your laptop):
mkdir -p ~/awesome-workshop
git clone ssh://git@gitlab.cern.ch:7999/${USER}/awesome-gitlab-cms.git
Choosing the correct GitLab runner
Standard GitLab runners at CERN do not mount CVMFS, which is required for
setting up CMSSW. In order to get a runner that mounts CVMFS, all you need
to do is add a tag
to your gitlab-ci.yml
file:
tags:
- cvmfs
A minimal .gitlab-ci.yml
file to get a runner with CVMFS looks like the following:
cmssw_setup:
tags:
- cvmfs
script:
- ls /cvmfs/cms.cern.ch/
The cmssw_setup
line defines the name of the job, and all the job does is
list /cvmfs/cms.cern.ch/
, which would fail if CVMFS isn’t mounted. In the
GitLab UI one can see the output, and also the cvmfs
label:
Setting up CMSSW
CMS-specific setup
Since the default user in the runner is not your username and the container doesn’t know anything about you in the first place, it doesn’t have any CMS-related environment as people registered as CMS members (via the zh group on LXPLUS). This means that everything needs to be set up manually.
To set up a CMSSW release (here CMSSW_10_6_8_patch1
), you would usually
run the following commands:
source /cvmfs/cms.cern.ch/cmsset_default.sh
cmsrel CMSSW_10_6_8_patch1
cd CMSSW_10_6_8_patch1/src
cmsenv
Maybe the second command will print out a warning such as
WARNING: Developer's area is created for non-production architecture slc7_amd64_gcc820. Production architecture for this release is slc7_amd64_gcc700.
which can be ignored in this case (or could be removed by first executing
export SCRAM_ARCH=slc7_amd64_gcc700
).
The command source /cvmfs/cms.cern.ch/cmsset_default.sh
sets several
environment variables, in particular adding /cvmfs/cms.cern.ch/common
to
the ${PATH}
. You can check this by running echo ${PATH}
. Another effect
of this command is that several aliases are defined, which means that
executing the alias command effectively executes the original command.
Printing all set aliases
To print all aliases that are set, just run
alias
.
Exercise: Determining CMSSW-related aliases
What are the actual commands behind
cmsenv
andcmsrel
?
Solution: Determining CMSSW-related aliases
The most important aliases are in the table below:
Alias Command cmsenv
eval `scramv1 runtime -sh`
cmsrel
scramv1 project CMSSW
The meaning of
eval
: The args are read and concatenated together into a single command. This command is then read and executed by the shell, and its exit status is returned as the value ofeval
. If there are no args, or only null arguments,eval
returns 0.
Knowing that a command is an alias is important, since bash
does not
automatically expand aliases when running non-interactively, which is the
case when running in GitLab.
In order to make aliases work in the GitLab runners, one needs to explicitely enable alias expansion:
shopt -s expand_aliases
Another common pitfall when setting up CMSSW in GitLab is that the execution fails because the setup script doesn’t follow best practives for shell scripts such as returning non-zero return values even if the setup is OK or using unset variables. Even if the script exits without visible error message, there could be something wrong. It is therefore often a good idea to circumvent issues like that by disabling strict checks before running the setup command and enabling these checks afterwards again.
Exercise: Set up CMSSW in GitLab
Knowing all this, can you write the yaml to set up CMSSW in GitLab starting from the fragment above and check if this is all working by executing
cmsRun --help
at the end?
Solution: Set up CMSSW in GitLab
Here is a possible solution:
cmssw_setup: tags: - cvmfs variables: # This is also set on LXPLUS CMS_PATH: /cvmfs/cms.cern.ch script: # IMPORTANT: Expand aliases in noninteractive bash mode # Otherwise cmsrel and cmsenv won't work - shopt -s expand_aliases # access CVMFS - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u - cmsrel CMSSW_10_6_8_patch1 - cd CMSSW_10_6_8_patch1/src - cmsenv - cmsRun --help
The
set +u
command turns off errors for referencing unset variables. It isn’t really needed here, since-u
(i.e. not allowing to use unset variables) isn’t set by default, but the script would fail if one usedset -u
somewhere else, so it’s safer to catch this here.
The reason why in the example above the variable ${CMS_PATH}
is used and not simply
/cvmfs/cms.cern.ch
directly is just to mimick the default environment you would get on
LXPLUS. You can check if this is the case for you as well by running env | grep CMS_PATH
after logging on to LXPLUS.
You can see some examples in the payload GitLab repository for this lesson.
Key Points
GitLab CVMFS runners are required to use CMSSW.
The setup script sets aliases, which are not expanded by default.
If the setup script tries to access unset variables, then that can cause the CI to fail when using strict shell scripting checks.
Compiling a CMSSW package
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How can I compile my CMSSW package using GitLab CI?
How do I add other CMSSW packages?
Objectives
Successfully compile CMSSW example analysis code in GitLab CI
Now that you know how to get a CMSSW environment, it is time to do something useful with it.
Compiling code within the repository
For your analysis to be compiled with CMSSW, it needs to reside in the
workarea’s src
directory, and in there follow the directory structure of
two subdirectories (e.g. AnalysisCode/MyAnalysis
) within which there can be
src
, interface
, plugin
and further directories. Your analysis code
(under version control in GitLab/GitHub) will usually not contain the
CMSSW workarea. The git repository will either
contain the analysis code at the lowest level or could be collected in a
subdirectory to disentangle it from your configuration files such as the
.gitlab-ci.yml
file.
We will use an example analysis, which selects pairs of electrons and muons.
Download the zip file containing the analysis
and extract it now. The analysis code is
in a directory called ZPeakAnalysis
within which plugins
(the C++ code)
and test
(the python config) directories reside.
Add this directory to your repository:
# unzip ZPeakAnalysis.zip
# mv ZPeakAnalysis ~/awesome-workshop/awesome-gitlab-cms/
git add ZPeakAnalysis
git commit -m "Add ZPeakAnalysis"
When trying to compile the code in GitLab, the ZPeakAnalysis
needs
to be copied into the CMSSW workarea, and it’s advisable to use environment
variables for this purpose. This would be achieved like this:
mkdir ${CMSSW_BASE}/src/AnalysisCode
cp -r "${CI_PROJECT_DIR}/ZPeakAnalysis" "${CMSSW_BASE}/src/AnalysisCode/"
With these two commands we will now be able to extend the .gitlab-ci.yml
file such that we can compile our analysis code in GitLab. To improve the
readability of the file, the CMSSW_RELEASE
is defined as a variable:
cmssw_compile:
tags:
- cvmfs
variables:
CMS_PATH: /cvmfs/cms.cern.ch
CMSSW_RELEASE: CMSSW_10_6_8_patch1
script:
- shopt -s expand_aliases
- set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
- cmsrel ${CMSSW_RELEASE}
- cd ${CMSSW_RELEASE}/src
- cmsenv
- mkdir -p AnalysisCode
- cp -r "${CI_PROJECT_DIR}/ZPeakAnalysis" "${CMSSW_BASE}/src/AnalysisCode/"
- scram b
Exercise: Test that compilation works
Copy the files from https://gitlab.cern.ch/awesome-workshop/payload-gitlab-cms/tree/master/ZPeakAnalysis to your repository and confirm that the code compiles by checking that the GitLab Job succeeds.
Adding CMSSW packages
Always add CMSSW packages before compiling analysis code!
Adding CMSSW packages has to happen before compiling analysis code in the repository, since
git cms-addpkg
will callgit cms-init
for the$CMSSW_BASE/src
directory, andgit init
doesn’t work if the directory already contains files.
Assuming that you would like to check out CMSSW packages using the commands described in the CMSSW FAQ, a couple of additional settings need to be applied. For instance, try running the following command in GitLab CI after having set up CMSSW:
git cms-addpkg PhysicsTools/PatExamples
This will fail:
Cannot find your details in the git configuration.
Please set up your full name via:
git config --global user.name '<your name> <your last name>'
Please set up your email via:
git config --global user.email '<your e-mail>'
Please set up your GitHub user name via:
git config --global user.github <your github username>
There are a couple of options to make things work:
- set the config as described above,
- alternatively, create a
.gitconfig
in your repository and use it as described here, - run
git cms-init --upstream-only
beforegit cms-addpkg
to disable setting up a user remote.
For simplicity, and since we do not need to commit anything back to CMSSW from
GitLab, we will use the latter approach.
A complete yaml
fragment that checks out a CMSSW package after having set up
CMSSW and then compiles the code looks as follows:
cmssw_addpkg:
stage: compile
tags:
- cvmfs
variables:
CMS_PATH: /cvmfs/cms.cern.ch
CMSSW_RELEASE: CMSSW_10_6_8_patch1
script:
- shopt -s expand_aliases
- set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
- cmsrel ${CMSSW_RELEASE}
- cd ${CMSSW_RELEASE}/src
- cmsenv
# If within CERN, we can speed up interaction with CMSSW:
- export CMSSW_MIRROR=https://:@git.cern.ch/kerberos/CMSSW.git
# This is another trick to speed things up independent of your location:
- export CMSSW_GIT_REFERENCE=/cvmfs/cms.cern.ch/cmssw.git.daily
# Important: run git cms-init with --upstream-only flag to not run into
# problems with git config
- git cms-init --upstream-only
- git cms-addpkg PhysicsTools/PatExamples
- scram b
The additional two variables that are exported here, CMSSW_MIRROR
and
CMSSW_GIT_REFERENCE
can speed up interaction with git, in particular
faster package checkouts. Mind that CMSSW_MIRROR
is specific to when
developing within the CERN network. Settings these variables is not
mandatory.
Key Points
For code to be compiled in CMSSW, it needs to reside within the work area’s
src
directory.The code needs to be copied manually using the CI script.
When using commands such as
git cms-addpkg
, the git configuration needs to be adjusted/set first.
Obtaining a grid proxy
Overview
Teaching: 10 min
Exercises: 15 minQuestions
How can I obtain a grid proxy in GitLab?
Objectives
Securely add grid proxy certificates and passwords to GitLab
Successfully obtain a grid proxy for the CMS VO
Securely adding passwords and files to GitLab
When trying to access CMS data, a grid, or often also referred to as Virtual
Organization Membership Service (VOMS) proxy is needed in most cases. In
order to be able to obtain this proxy, your userkey.pem
and usercert.pem
files, which by default will reside in the ~/.globus
directory, will need
to be stored in GitLab.
Keep your secrets secret!
Please be extra careful when it comes to your account and grid passwords as well as your certificates! They should never be put in any public place. Putting them under version control is risky, since even if you delete them from the
HEAD
of yourmaster
branch, they will still be in the commit history. Furthermore, putting them in a public, or even a private but shared repository, is a violation of grid policy, and could lead to access being revoked for the offending user. Should you accidentally have put sensitive data to a repository, please see the guide by GitHub to remove them (though the data should still be considered compromised).
Please make sure to revisit the section on private information/access control from the Continuous Integration / Continuous Development (CI/CD) on how to add variables in GitLab CI/CD in general. From that lesson you will know how to add e.g. your grid proxy password. The grid certificate itself, however, consists of two files that look like this:
cat ~/.globus/usercert.pem
Bag Attributes
localKeyID: 95 A0 95 B0 1e AB BD 13 59 D1 D2 BB 35 5A EA 2E CD 47 BA F7
subject=/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=username/CN=123456/CN=Anonymous Nonamious
issuer=/DC=ch/DC=cern/CN=CERN Grid Certification Authority
-----BEGIN CERTIFICATE-----
TH1s1SNT4R34lGr1DC3rt1f1C4t3But1Th4s4l3NgtH0F64CH4r4ct3rSP3rL1N3
1amT00La2YT0wR1T345m0r3l1N3S0fn0ns3NS3S01/lLSt0pH3r3AndADdsPAc3S
...45 more lines of l33t dialect...
+4nd+heL4S+38cH4r4c+ersBef0rE+HE1+enDs==
-----END CERTIFICATE-----
We need more base: base64
Simply pasting them into GitLab does not work since the line breaks will not
be reflected correctly. There is a trick we can play though: we can encode the
files including line breaks so that they are simply a string, which we can
decode to yield the same result as the input. The tool of our choice is
base64
. Let’s give this a go.
Exercise: Encode using
base64
Copy the output of the
cat ~/.globus/usercert.pem
output above into a text file calledtestcert.txt
, and pipe the content of this file to thebase64
command or use it as input file directly (hint:base64 --help
).
Solution: Encode using
base64
The command should be (when piping):
cat testcert.txt | base64
or (when using the input file directly - this is better):
base64 -i testcert.txt
and the output will then be the following:
QmFnIEF0dHJpYnV0ZXMKICAgIGxvY2FsS2V5SUQ6IDk1IEEwIDk1IEIwIDFlIEFCIEJEIDEzIDU5IEQxIEQyIEJCIDM1IDVBIEVBIDJFIENEIDQ3IEJBIEY3CnN1YmplY3Q9L0RDPWNoL0RDPWNlcm4vT1U9T3JnYW5pYyBVbml0cy9PVT1Vc2Vycy9DTj11c2VybmFtZS9DTj0xMjM0NTYvQ049QW5vbnltb3VzIE5vbmFtaW91cwppc3N1ZXI9L0RDPWNoL0RDPWNlcm4vQ049Q0VSTiBHcmlkIENlcnRpZmljYXRpb24gQXV0aG9yaXR5Ci0tLS0tQkVHSU4gQ0VSVElGSUNBVEUtLS0tLQpUSDFzMVNOVDRSMzRsR3IxREMzcnQxZjFDNHQzQnV0MVRoNHM0bDNOZ3RIMEY2NENINHI0Y3QzclNQM3JMMU4zCjFhbVQwMExhMllUMHdSMVQzNDVtMHIzbDFOM1MwZm4wbnMzTlMzUzAxL2xMU3QwcEgzcjNBbmRBRGRzUEFjM1MKLi4uNDUgbW9yZSBsaW5lcyBvZiBsMzN0IGRpYWxlY3QuLi4KKzRuZCtoZUw0UyszOGNINHI0YytlcnNCZWYwckUrSEUxK2VuRHM9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
Decoding works by adding the -d
(Linux) or -D
(MacOS) flag to the
base64
command. You can verify that this works by directly decoding again
as follows:
base64 -i testcert.txt | base64 -d
which should give you the pseudo-certificate from above. Have a go at the exercise below and try to decode the secret phrase:
Exercise: Decode using
base64
Decode the following string using the
base64
command:SSB3aWxsIG5ldmVyIHB1dCBteSBzZWNyZXRzIHVuZGVyIHZlcnNpb24gY29udHJvbAo=
Solution: Decode using
base64
The command should be (mind the capitalisation of the
-D
/-d
flag):echo "SSB3aWxsIG5ldmVyIHB1dCBteSBzZWNyZXRzIHVuZGVyIHZlcnNpb24gY29udHJvbAo=" | base64 -D
and the output should be the following:
I will never put my secrets under version control
Adding grid certificate and password to GitLab
There are a couple of important things to keep in mind when adding passwords and certificates as variables to GitLab:
- Variables should always be set to
Protected
state. - As an additional safety measure, set them as
Masked
as well if possible (this will not work for the certificates but should for your grid password).
For more details, see the
Variables: Advanced use section
of the GitLab documentation. Setting variables to Protected
means that
they are only available in protected branches, e.g. your master
branch.
This is important when collaborating with others, since anyone with access
could just echo
the variables when making a merge request if you run
automated tests on merge requests.
We will add the following three variables:
GRID_PASSWORD
: password for the grid certificateGRID_USERCERT
: grid user certificate (usercert.pem
)GRID_USERKEY
: grid user key (userkey.pem
)
For safety and to avoid issues with special characters, you should not
simply add your grid proxy password in GitLab,
but always encode it using base64
.
For your password do the following
(make sure nobody’s peeking at your screen):
printf 'mySecr3tP4$$w0rd' | base64
Mind the single quotes ('
) and not double quotes ("
). If you are on Linux,
you should -w 0
to the base64
command.
For the two certificates, use them as input to base64
directly:
base64 -i ~/.globus/usercert.pem
base64 -i cat ~/.globus/userkey.pem
and copy the output into GitLab.
Every equal sign counts!
Make sure to copy the full string including the trailing equal signs.
The Settings
–> CI / CD
–> Variables
section should look like this:
Better safe than sorry
To reduce the risk of leaking your passwords and certificates to others, you should protect your master branch, effectively preventing you and others from pushing to it directly and e.g. print your password to the job logs. To do so, go to Settings -> Repository -> Protected Branches. Mind that the option chosen below still puts a lot of trust in your collaborators. With the Protected option chosen above for the variables, the variables are then only available to those branches (but still allow Maintainers to push to them):
Using the grid proxy
With the grid secrets stored, we can now make use of them. We need to first
restore the grid certificate files in the ~/.globus
directly, then run the
voms-proxy
command and pass the grid proxy password to it. This is done as
follows:
mkdir -p ${HOME}/.globus
printf "${GRID_USERCERT}" | base64 -d > ${HOME}/.globus/usercert.pem
printf "${GRID_USERKEY}" | base64 -d > ${HOME}/.globus/userkey.pem
chmod 400 ${HOME}/.globus/userkey.pem
printf "${GRID_PASSWORD}" | base64 -d | voms-proxy-init --voms cms --pwstdin
Trying this with the standard GitLab CC7 runner will fail, since the
CMS-specific certificates are not included in the image. An image that
has these certificates installed already is
gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cms:latest
.
An example to obtain a grid proxy, check it, and then destroy it again
would result in the following yaml
:
voms_proxy:
image:
name: gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cms:latest
entrypoint: [""]
script:
- mkdir -p ${HOME}/.globus
- printf "${GRID_USERCERT}" | base64 -d > ${HOME}/.globus/usercert.pem
- printf "${GRID_USERKEY}" | base64 -d > ${HOME}/.globus/userkey.pem
- chmod 400 ${HOME}/.globus/userkey.pem
- printf "${GRID_PASSWORD}" | base64 -d | voms-proxy-init --voms cms --pwstdin
- voms-proxy-info --all
- voms-proxy-destroy
You could take this further by e.g. performing a DAS query to keep your input files up-to-date.
Confirm that this works for you before moving on to the next section!
In case of problems, you might need to add -w 0
to the base64 -d
command.
Key Points
Special care is needed when adding secrets in GitLab
Passwords and certificates should always be set to
Protected
stateCertificates need to be
base64
-encoded for use as secrets
Running a CMSSW job
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How can I run CMSSW in GitLab CI?
How can avoid compiling my code for each job?
Objectives
Successfully run a test job of a simplified Z to leptons analysis
Use GitLab artifacts to pass compiled analysis code
Being able to set up CMSSW and to compile code in GitLab, and knowing how to access CMS data, the next step is to run test jobs to confirm that the code yields the expected results.
Fair use
Please remember that the provided runners are shared among all users, so please avoid massive pipelines and CI stages with more than 5 jobs in parallel or that run with a parallel configuration higher than 5.
If you need to run these pipelines please deploy your own private runners to avoid affecting the rest of the users.
Requirements for running CMSSW
In most cases, you will run your tests on centrally produced files. In order
to be able to access those, you will require a grid proxy valid for the CMS
virtual organisation (VO) as described in the previous section. For files
located on EOS, please check the section on
private information/access control
from the
Continuous Integration / Continuous Development (CI/CD)
on how to get a Kerberos token via kinit
(we won’t be using this here).
For the analysis example provided in this lessons, we’ll use a single file
from the /DYJetsToLL_M-50_HT-100to200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/MINIAODSIM data set: /store/mc/RunIIFall17MiniAODv2/DYJetsToLL_M-50_HT-100to200_TuneCP5_13TeV-madgraphMLM-pythia8/MINIAODSIM/PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/50000/E43E4210-7742-E811-9430-AC1F6B23C96A.root
.
This file is set in
ZPeakAnalysis/test/MyZPeak_cfg.py
.
Executing cmsRun
In principle, all we need to do is compile the code as demonstrated in
episode 2,
adding the grid proxy as just done in
episode 3,
and then execute the cmsRun
command. Mind that do not need the
git cms-addpkg PhysicsTools/PatExamples
command here anymore,
i.e. remove it in the following! Putting this together, the
additional commands to run would be:
cd ${CMSSW_BASE}/src/AnalysisCode/ZPeakAnalysis/
cmsRun test/MyZPeak_cfg.py
ls -l myZPeak.root
where the last command just checks that an output file has been created. However, imagine that you would like to run test jobs on more than one file and to speed things up do this in parallel. This would mean that you would have to compile the code N times, which is a waste of resources and time. Instead, we can pass the compiled code from the compile step to the run step as described below.
Using artifacts to compile code only once
Artifacts have been introduced to you as part of the Continuous Integration / Continuous Development (CI/CD) lesson. You can find more detailed information in the GitLab documentation for using artifacts.
Artifacts are write-protected
One important thing to note is that artifacts are write-protected. You cannot write into the artifact directory in any of the following steps.
For the compiled code to be available in the subsequent steps, the directories
that should be provided need to be listed explicitely. The yaml
code from
the compilation step in
episode 2
needs to be extended as follows:
artifacts:
# artifacts:untracked ignores configuration in the repository’s .gitignore file.
untracked: true
expire_in: 20 minutes
paths:
- ${CMSSW_RELEASE}
As path we use ${CMSSW_RELEASE}
, i.e. the full CMSSW area. Since this area
is write protected, we need to copy the whole area to a new directory and
recursively add write permissions again. In the following, this new workarea
will have to be used:
script:
# ...
- mkdir run
- cp -r ${CMSSW_RELEASE} run/
- chmod -R +w run/${CMSSW_RELEASE}/
- cd run/${CMSSW_RELEASE}/src
- cmsenv
Exercise: Run CMSSW using the artifact from the compile step
You should now have all required ingredients to be able to extend the
.gitlab-ci.yml
file such that you can reuse the compiled code in thecmsRun
step.
Solution: Run CMSSW using the artifact from the compile step
A possible implementation could look like this:
cmssw_run: image: name: gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cms:latest entrypoint: [""] tags: - cvmfs variables: CMS_PATH: /cvmfs/cms.cern.ch EOS_MGM_URL: "root://eoscms.cern.ch" CMSSW_RELEASE: CMSSW_10_6_8_patch1 script: - shopt -s expand_aliases - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u - mkdir run - cp -r ${CMSSW_RELEASE} run/ - chmod -R +w run/${CMSSW_RELEASE}/ - cd run/${CMSSW_RELEASE}/src - cmsenv - mkdir -p ${HOME}/.globus - printf $GRID_USERCERT | base64 -d > ${HOME}/.globus/usercert.pem - printf $GRID_USERKEY | base64 -d > ${HOME}/.globus/userkey.pem - chmod 400 ${HOME}/.globus/userkey.pem - printf ${GRID_PASSWORD} | base64 -d | voms-proxy-init --voms cms --pwstdin - cd AnalysisCode/ZPeakAnalysis/ - cmsRun test/MyZPeak_cfg.py - ls -l myZPeak.root
Bonus: Store the output ROOT file as artifact
It could be useful to store the output ROOT file as an artifact so that you simply download it after job completion. Do you know how to do it? Hint: you need to provide the full path to it.
Key Points
A special CMSSW image is required to successfully run CMSSW jobs
Running on CMS data requires a grid proxy
The use of artifacts allows passing results of one step to the other
Since artifacts are write-protected, the directory needs to be copied before running CMSSW