Introduction
Overview
Teaching: 10 min
Exercises: 0 minQuestions
Which options are available to run CMSSW in a container?
Objectives
Be aware of what container images exists and their individual caveats.
CMS does not have a clear concept of separating analysis software from the rest of the experimental software stack such as event generation, data taking, and reconstruction. This means that there is just one CMSSW, and the releases have a size of several Gigabytes (>20 GB for the last releases).
From the user’s and computing point of view, this makes it very impractical to build and use images that contain a full CMSSW release. Imagine running several hundred batch jobs where each batch node first needs to download several Gigabytes of data before the job can start, amounting to a total of tens of Terabytes. These images, however, can be useful for offline code development, in case CVMFS is not available, as well as for overall preservation of the software.
An alternative to full CMSSW release containers are Linux containers that only contain the underlying base operating system (e.g. Scientific Linux 5/6 or CentOS 7/8) including additionally required system packages. The CMSSW release is then mounted from the host system on which the container is running (which could be your laptop, a GitLab runner, or a Kubernetes node). These images have a size of a few hundred Megabytes, but rely on a good network connection to access the CVMFS share.
One thing that has not been covered in detail in the introduction to Docker is that containers do not necessarily have to be executed using Docker. There are several so-called container run-times that allow the execution of containers. CMS uses Apptainer (previously Singularity) for sample production, and use of Singularity is also centrally supported and documented. The main reason for that is that Singularity is popular in high-performance and high-throughput computing and does not require any root privileges.
While executing images on LXPLUS and HTCondor is more practical with Apptainer/Singularity, running in GitLab CI is by default done using Docker. Since Apptainer uses a proprietary image format, but supports reading and executing Docker images, building images is better done using Docker.
Key Points
Full CMSSW release containers are very big.
It is more practical to use light-weight containers and obtain CMSSW via CVMFS.
The centrally supported way to run CMSSW in a container is using Apptainer.
Using full CMSSW containers
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How can I obtain a standalone CMSSW container?
Objectives
Understanding how to find and use standalone CMSSW containers.
As discussed in the introduction, the images that contain full CMSSW releases can be very big. CMS computing therefore does not routinely build these images. However, as part of the CMS Open Data effort, images are provided for some releases. You can find those on Docker Hub. In addition, a build service is currently under development.
If you would like to use these images, you can use them in the same way as
the other CMS images with the only difference that the CMSSW software in the
container is in /opt/cms
and not within /cvmfs/cms.cern.ch
.
You can run the containers as follows (pick either bash
or zsh
) when
using the version published on Docker Hub:
docker run --rm -it cmsopendata/cmssw:10_6_8_patch1 /bin/zsh
The images are in several cases also mirrored on the CERN GitLab registry:
docker run --rm -it gitlab-registry.cern.ch/clange/cmssw-docker/cmssw_10_6_8_patch1 /bin/zsh
Do not use for large-scale job submission nor on GitLab!
Due to the large size of these images, they should only be used for local development.
Key Points
Standalone CMSSW containers are currently not routinely built due to their size.
They need to be built/requested when needed.
Light-weight CMSSW containers
Overview
Teaching: 15 min
Exercises: 10 minQuestions
How can I get a more light-weight CMSSW container?
What are the caveats of using a light-weight CMSSW container?
Objectives
Understand how the light-weight CMS containers can be used.
In a similar way to running CMSSW in GitLab, the images containing only the base operating system (e.g. Scientific Linux 5/6 or CentOS 7/8) plus additionally required system packages can be used to run CMSSW (and other related software). CMSSW needs to be mounted via CVMFS. This is the recommended way!
For this lesson, we will continue with the repository we used for the GitLab CI for CMS lesson and just add to it.
Adding analysis code to a light-weight container
Instead of using these containers only for compiling and running CMSSW, we can add our (compiled) code to those images, building on top of them. The advantage in doing so is that you will effectively be able to run your code in a version-controlled sandbox, in a similar way as grid jobs are submitted and run. Adding your code on top of the base image will only increase their size by a few Megabytes. CVMFS will be mounted in the build step and also whenever the container is executed. The important conceptual difference is that we do not use a Dockerfile to build the image since that would not have CVMFS available, but instead we use Docker manually as if it was installed on a local machine.
The way this is done is by requesting a docker-privileged GitLab runner. With such a runner we can run Docker-in-Docker, which allows to manually attach CVMFS to a container and run commands such as compiling analysis code in this container. Compiling code will add an additional layer to the container, which consists only of the effect of the commands run. After exiting this container, we can tag this layer and push the container to the container registry.
The YAML required looks as follows:
build_docker:
only:
- pushes
- merge_requests
tags:
- docker-privileged
image: docker:19.03.1
services:
# To obtain a Docker daemon, request a Docker-in-Docker service
- docker:19.03.1-dind
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_BUILD_TOKEN $CI_REGISTRY
# Need to start the automounter for CVMFS:
- docker run -d --name cvmfs --pid=host --user 0 --privileged --restart always -v /shared-mounts:/cvmfsmounts:rshared gitlab-registry.cern.ch/vcs/cvmfs-automounter:master
script:
# ls /cvmfs/cms.cern.ch/ won't work, but from the container it will
# If you want to automount CVMFS on a new docker container add the volume config /shared-mounts/cvmfs:/cvmfs:rslave
- docker run -v /shared-mounts/cvmfs:/cvmfs:rslave -v $(pwd):$(pwd) -w $(pwd) --name ${CI_PROJECT_NAME} ${FROM} /bin/bash ./.gitlab/build.sh
- SHA256=$(docker commit ${CI_PROJECT_NAME})
- docker tag ${SHA256} ${TO}
- docker push ${TO}
variables:
FROM: gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cms:latest
TO: ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}
This is pretty complicated, so let’s break this into smaller pieces.
The only
section determines when the step is actually run. The default
should probably be pushes
only so that a new image is built whenever there
are changes to a branch. If you would like to build a container already when
a merge request is created so that you can test the code before merging, also
add merge_requests
as in the example provided here.
The next couple of lines are related to the special Docker-in-Docker runner.
For this to work, the runner needs to be privileged, which is achieved by
adding docker-privileged
to the tags
. The image to run is then
docker:19.03.1
, and in addition a special service
with the name
docker:19.03.1-dind
is required.
Once the runner is up, the before_script
section is used to prepare the
setup for the following steps. First, the runner logs in to the GitLab image
registry with an automatically provided token (this is a property of the job
and does not need to be set by you manually). The second command starts a
special container, which mounts CVMFS and makes it available to our analysis
container.
In the script
section the analysis container is then started, doing the following:
- mounting the volume (
-v /shared-mounts/cvmfs:/cvmfs:rslave
), - mounting the current working directory (
-v $(pwd):$(pwd)
), - setting the current working directory mounted as working directory inside the container (
-w $(pwd)
), - settings its name to the project name (
--name ${CI_PROJECT_NAME}
), - and executing the command
/bin/bash ./.gitlab/build.sh
.
The name of the image that is started is set via the ${FROM}
variable,
which is set to be gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cms:latest
here.
After the command that has been run in the container exits, a new commit
will have been added to the container. We can find out the hash of this
commit by running docker commit ${CI_PROJECT_NAME}
(this is why we set the
container name to ${CI_PROJECT_NAME}
). With the following command, we then
tag this commit with the repository’s registry name and a unique hash that
corresponds to the git
commit at which we have built the image. This allows
for an easy correpondence between container name and source code version. The last command simply pushed this image to the registry.
Exercise: Compile the
ZPeakAnalysis
inside the containerThe one thing that has not yet been explained is what the
build.sh
script does. This file needs to be part of the repository. Take theZPeakAnalysis
directory from yesterday’s lesson, add it to the repository, and compile the code by adding the required commands to thebuild.sh
script.
Solution: Compile the
ZPeakAnalysis
inside the containerA possible solution could look like this:
#!/bin/bash # exit when any command fails; be verbose set -ex # make cmsrel etc. work shopt -s expand_aliases export MY_BUILD_DIR=${PWD} source /cvmfs/cms.cern.ch/cmsset_default.sh cd /home/cmsusr cmsrel CMSSW_10_6_8_patch1 mkdir -p CMSSW_10_6_8_patch1/src/AnalysisCode mv ${MY_BUILD_DIR}/ZPeakAnalysis CMSSW_10_6_8_patch1/src/AnalysisCode cd CMSSW_10_6_8_patch1/src cmsenv scram b
Since developing this using GitLab is tricky, the next episodes will cover how you can develop this interactively on LXPLUS using Singularity or on your own computer running Docker. The caveat of using these light-weight images is that they cannot be run autonomously, but always need to have CVMFS access to do anything useful.
Why the
.gitlab
directory?Putting the
build.sh
script into a directory called.gitlab
is a recommended convention. If you develop code locally (e.g. on LXPLUS), you will have a different directory structure. Your analysis code, i.e.ZPeakAnalysis
, will reside withinCMSSW_10_6_8_patch1/src/AnalysisCode
, > and executing the script from within theZPeakAnalysis
does not make much sense, because you will create a CMSSW workarea within an existing one. Therefore, using a hidden directory with a name that makes it clear that this is for running within GitLab, and is ignored otherwise, can be useful.
Key Points
The light-weight CMS containers need to mount CVMFS.
They will only work with CVMFS available.
Accessing CVMFS from Docker locally
Overview
Teaching: 10 min
Exercises: 15 minQuestions
How can I access CVMFS from my computer?
How can I access CVMFS from Docker?
Objectives
Be aware of CVMFS with Docker options.
Successfully mount CVMFS via a privileged container.
In order to use CVMFS via Docker, a couple of extra steps need to be taken. There are different approaches:
- Installing CVMFS on your computer locally and mounting it from the container.
- Mounting CVMFS via another container and providing it to the analysis container.
- Mounting CVMFS from the analysis container.
We will go through these options in the following.
This is where things get ugly
Unfortunately, all the options below have some caveats, and they might not even work on your computer. At the moment, no clear recommendations can be given. Try for yourself which option works best for you. However, it is not essential for this lesson that any of these options work. We will learn about other options that will work later.
Installing CVMFS on a local computer
CVMFS can be installed locally on your computer. Packages and installation
packages are provided on the CVMFS Downloads page. In the
interest of time, we will not install CVMFS now, but instead use the second
option above in the following. If you would like to install CVMFS on your
computer, make sure to read the
CVMFS Client Quick Start Guide.
Please also have a look at the
CVMFS with Docker documentation
to avoid common pitfalls when running Linux on your computer and trying to
bind mount CVMFS from the host. This is not necessary when running on a Mac.
However, on a Mac you need to go to Docker Settings -> Resources -> File Sharing
and add /cvmfs
to enable bind mounting.
To run your analysis container and give it access to the /cvmfs
mount, run
the following command (mind that --rm
deletes the container after exiting):
docker run --rm -it -v /cvmfs:/cvmfs gitlab-registry.cern.ch/clange/cc7-cms /bin/bash
Using the cvmfs-automounter
The first option needed CVMFS to be installed on your computer. Using the
cvmfs-automounter
is effectively mimicking what is done on GitLab. First, a
container, the cvmfs-automounter
, is started that mounts CVMFS, and then
this container provides the CVMFS mount to other containers. If you are
running Linux, the following command should work.
On a Mac, however, this will not work (at least at the moment). This could
work if you are using Windows Subsystem for Linux 2 (WSL2) in combination
with Docker for WSL2.
sudo mkdir /shared-mounts
docker run -d --name cvmfs --pid=host --user 0 --privileged --restart always -v /shared-mounts:/cvmfsmounts:rshared gitlab-registry.cern.ch/vcs/cvmfs-automounter:master
This container is running as a daemon (-d
), but you can still see it via
docker ps
and also kill it using docker kill cvmfs
.
docker run -v /shared-mounts/cvmfs:/cvmfs:rslave -v $(pwd):$(pwd) -w $(pwd) --name ${CI_PROJECT_NAME} ${FROM} /bin/bash
Mounting CVMFS from the analysis container
This is what I personally would recommend at the moment. It seems to work on both Mac, Windows 10 Pro, and most Linux systems. The caveat is that the container runs with elevated privileges, but if you trust me, you can use it.
docker run --rm --cap-add SYS_ADMIN --device /dev/fuse -it gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cmssw-cvmfs:latest bash
If you get an error similar to:
/bin/sh: error while loading shared libraries: libtinfo.so.5: failed to map segment from shared object: Permission denied
you need to turn off SElinux security policy enforcing:
sudo setenforce 0
This can be changed permanently by editing /etc/selinux/config
, setting SELINUX
to permissive
or disabled
. Mind, however, that there are certain security issues with disabling SElinux security policies as well as running privileged containers.
Exercise: Give it a try!
Try if you can run the following command from your cloned repository base directory:
docker run --rm --cap-add SYS_ADMIN --device /dev/fuse -it -v $(pwd):$(pwd) -w $(pwd) gitlab-registry.cern.ch/clange/cmssw-docker/cc7-cmssw-cvmfs:latest bash
This should set up CMSSW, compile your code, and then exit the container again. You can of course also do this manually, i.e. start bash in the container and execute the
build.sh
afterwards so that you stay inside the container.
The downside to starting CVMFS in the container
The CVMFS daemon is started when the container is started for the first time. It is not started again when you e.g. lose your network connection or simply connect back to the container at a later stage. At that point, you won’t have CVMFS access anymore.
Developing CMS code on your laptop
By using containers, you can effectively develop any all HEP-related code (and beyond) on your local development machine, and it doesn’t need to know anything about CVMFS or CMSSW in the first place.
Key Points
You can install CVMFS on your local computer.
The
cvmfs-automounter
allows you to provide CVMFS to other containers on Linux.Privileged containers can be dangerous.
You can mount CVMFS from within a container on container startup.
Using Singularity
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How can I use CMSSW inside a container on LXPLUS?
Objectives
Understand some of the differences between Singularity and Docker.
Successfully run a custom analysis container on LXPLUS.
The previous episode has given you an idea how complicated it can be to run containers with CVMFS access on your computer. However, at the same time it gives you the possibility to develop code on a computer that doesn’t need to know anything about CMS software in the first place. The only requirement is that Docker is installed.
You will also have noticed that in several cases privileged containers
are needed. These are not available to you on LXPLUS (nor is the docker
command). On LXPLUS, the tool to run containers is Singularity.
The following commands will therefore all be run on LXPLUS
(lxplus7.cern.ch
or later specifically).
CMS documentation on Singularity
Before we go into any detail, you should be aware of the
central CMS documentation. These commands are only
available via /cvmfs/cms.cern.ch/common
. The cmssw-env
command is
actually a shell script that sets some variables automatically and then
runs Singularity. The nice thing about Singularity is that you can
mount /cvmfs
, /eos
, and /afs
without any workarounds. This is
automatically done when running the cmssw-env
command.
Exercise: Run the CC7 Singularity container
Confirm that you can access your EOS home directory (
/eos/user/${USER:0:1}/${USER}
) from the Singularity CC7 shell.
Solution: Run the CC7 Singularity container
cmssw-cc7 ls /eos/user/${USER:0:1}/${USER} exit
Running custom images with Singularity
The CMS script discussed above is “nice-to-have” and works well if you simply want to run some CMSSW code on a different Linux distribution, but it also hides a lot of the complexity when running Singularity. For the purpose of running your analysis image, we cannot use the script above, but instead need to run Singularity manually.
As an example, we are going to run the image that we used for
getting the VOMS proxy in the GitLab CI session. Before running
Singularity, mind that you should set the cache directory, i.e.
the directory to which the images are being pulled, to a
place outside your AFS space (here we use the tmp
directory):
export SINGULARITY_CACHEDIR="/tmp/$(whoami)/singularity"
singularity shell -B /afs -B /eos -B /cvmfs docker://cmssw/cc7:latest
source /cvmfs/cms.cern.ch/cmsset_default.sh
If you are asked for a docker username and password, just hit enter twice. If you get an error message such as:
FATAL: While making image from oci registry: failed to get checksum for docker://cmssw/cc7:latest: unable to retrieve auth token: invalid username/password
this is just a
Singularity bug.
To fix it, just delete the ~/.docker/config.json
file.
If you are past the authentication issue, you will get to see a
lot of garbage output and the singularity shell
command will still
fail. The reason for this is a
bug in Singularity.
One particular difference to note w.r.t. to Docker is that the image
name needs to be prepended by docker://
to tell Singularity that this
is a Docker image.
As you can see from the output, Singularity first downloads the layers
from the registry, and is then unpacking the layers into a format that
can be read by Singularity. This is somewhat a technical detail, but
this step is what fails at the moment (and is different w.r.t. Docker).
ERROR: build: failed to make environment dirs: mkdir /tmp/clange/rootfs-ef013f60-51c7-11ea-bbe0-fa163e528257/.singularity.d: permission denied
FATAL: While making image from oci registry: while building SIF from layers: packer failed to pack: while inserting base environment: build: failed to make environment dirs: mkdir /tmp/clange/rootfs-ef013f60-51c7-11ea-bbe0-fa163e528257/.singularity.d: permission denied
Once there is a new Singularity version (check via
singularity --version
) more recent than
3.5.2-1.1.el7
this will hopefully be fixed. For now, we cannot
use Singularity in this way. Otherwise, we’d be able to use
the shell
to develop code interactively, and then use
exec
to execute a script such as yesterday’s build.sh
script:
export SINGULARITY_CACHEDIR="/tmp/$(whoami)/singularity"
singularity exec -B /afs -B /eos -B /cvmfs docker://cmssw/cc7:latest bash .gitlab/build.sh
exec
vs.shell
Singularity differentiates between providing you with an interactive shell (
singularity shell
) and executing scripts non-interactively (singularity exec
).
Authentication with Singularity
In case your image is not public, you can authenticate to
the registry in two different ways: either you append the
option --docker-login
to the singularity
command, which
makes sense when running interactively, or via environment
variables (e.g. on GitLab):
export SINGULARITY_DOCKER_USERNAME=${CERNUSER}
export SINGULARITY_DOCKER_PASSWORD='mysecretpass'
In the following episode we will try to work around the issues observed above by using a very nice way to access unpacked images directly via CVMFS.
Key Points
Singularity needs to be used on LXPLUS.
CMS Computing provides a wrapper script to run CMSSW in different Linux environments (SLC5, SLC6, CC7, CC8).
To run your own container, you need to run Singularity manually.
Using unpacked.cern.ch
Overview
Teaching: 10 min
Exercises: 5 minQuestions
What is
unpacked.cern.ch
?How can I use
unpacked.cern.ch
?Objectives
Understand how your images can be put on
unpacked.cern.ch
As was pointed out in the previous episode, Singularity uses unpacked Docker
images. These are by default unpacked into the current working directory,
and the path can be changed by setting the SINGULARITY_CACHEDIR
variable.
The EP-SFT group provides a service that unpacks Docker images and makes them available via a dedicated CVMFS area. In the following, you will learn how to add your images to this area. Once you have your image(s) added to this area, these images will be automatically synchronised from the image registry to the CVMFS area within a few minutes whenever you create a new version of the image.
We will continue with the ZPeakAnalysis
example, but for demonstration
purposes we will use an example payload.
Exploring the CVMFS unpacked.cern.ch
area
The unpacked area is a directory structure within CVMFS:
ls /cvmfs/unpacked.cern.ch/
gitlab-registry.cern.ch registry.hub.docker.com
You can see the full directory structure of an image:
ls /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/awesome-workshop/payload-docker-cms:3daaa96e
afs builds dev eos home lib64 media opt proc run singularity sys usr
bin cvmfs environment etc lib lost+found mnt pool root sbin srv tmp var
This can be useful for investigating some internal details of the image.
As mentioned above, the images are synchronised with the respective registry. However, you don’t get to know when the synchronisation happened, but there is an easy way to check by looking at the timestamp of the image directory:
ls -l /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/awesome-workshop/payload-docker-cms:3daaa96e
lrwxrwxrwx. 1 cvmfs cvmfs 79 Feb 18 00:31 /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/awesome-workshop/payload-docker-cms:3daaa96e -> ../../.flat/28/28ba0646b6e62ab84759ad65c98cab835066c06e5616e48acf18f880f2c50f90
In the example given here, the image has last been updated on February 18th at 00:31.
Adding to the CVMFS unpacked.cern.ch
area
You can add your image to the unpacked.cern.ch
area by making a merge
request to the unpacked sync repository. In this repository
there is a file called recipe.yaml
, to which you
simply have to add a line with your full image name (including registry)
prepending https://
:
- https://gitlab-registry.cern.ch/awesome-workshop/payload-docker-cms:3daaa96e
As of 14th February 2020, it is also possible to use wildcards for the tags, i.e. you can simply add
- https://gitlab-registry.cern.ch/awesome-workshop/payload-docker-cms:*
and whenever you build an image with a new tag it will be synchronised
to /cvmfs/unpacked.cern.ch
.
Running Singularity using the unpacked.cern.ch
area
Running Singularity using the unpacked.cern.ch
area is done using the
same commands as listed in the previous episode with the only difference
that instead of providing a docker://
image name to Singularity,
you provide the path in /cvmfs/unpacked.cern.ch
:
singularity shell -B /afs -B /eos -B /cvmfs /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/awesome-workshop/payload-docker-cms:3daaa96e
Now you should be in an interactive shell almost immediately without any
image pulling or unpacking. One important thing to note is that for most
CMS images the default username is cmsusr
, and if you compiled your
analysis code in the container, it will by default reside in
/home/cmsusr
:
Singularity> cd /home/cmsusr/CMSSW_10_6_8_patch1/src/
Singularity> source /cvmfs/cms.cern.ch/cmsset_default.sh
Singularity> cmsenv
Singularity> cd AnalysisCode/ZPeakAnalysis/
Singularity> cmsRun test/MyZPeak_cfg.py
And there we are, we run the analysis in a container interactively!
However, there is one issue we will run in. After running over the
input file, the cmsRun
command will exit with a warning:
Warning in <TStorageFactoryFile::Write>: file myZPeak.root not opened in write mode
The output file will actually not be written. The reason for that is that
we cannot write into the container file system with Singularity.
We will have to change the MyZPeak_cfg.py
file such that it writes
out to a different path.
Challenge: Patch
MyZPeak_cfg.py
to write out to your EOS homeOr even better, use an environment variable to define this that if not set defaults to
./
. Mind that you cannot change files in the container, so the way to go is change the python config in the repository and have a new image built that can then be used.
Solution: Patch
MyZPeak_cfg.py
to write out to your EOS homeA possible solution could look like this
import os outPath = os.getenv("ANALYSIS_OUTDIR") + "/" if not outPath: outPath = "./" process.TFileService = cms.Service("TFileService", fileName = cms.string(outPath + 'myZPeak.root') )
Commit these changes, push them, and your new image will show up on
CVMFS within a few minutes. The new image has the tag 0950e980
.
singularity shell -B /afs -B /eos -B /cvmfs /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/awesome-workshop/payload-docker-cms:0950e980
Singularity> cd /home/cmsusr/CMSSW_10_6_8_patch1/src/
Singularity> source /cvmfs/cms.cern.ch/cmsset_default.sh
Singularity> cmsenv
Singularity> cd AnalysisCode/ZPeakAnalysis/
Singularity> export ANALYSIS_OUTDIR="/eos/user/${USER:0:1}/${USER}"
Singularity> cmsRun test/MyZPeak_cfg.py
Singularity> exit
ls -l /eos/user/${USER:0:1}/${USER}/myZPeak.root
Where to go from here?
Knowing that you can build images on GitLab and have them synchronised to the
unpacked.cern.ch
area, you now have the power to run reusable and versioned stages of your analysis. While we have only run test jobs using these containers, you can run them on the batch system, i.e. your full analysis in containers with effectively only advantages. The next step after this is to connect these stages using workflows, which will be taught tomorrow.
Key Points
The
unpacked.cern.ch
CVMFS area provides a very fast way of distributing unpacked docker images for access via Singularity.Using this approach you can run versioned and reusable stages of your analysis.