Finding container images
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How do I find existing container images?
How can I make use of them?
Objectives
Be aware of good locations for container images
When wanting to create a container image for your analysis, starting from scratch can be quite challenging since it can involve installing the correct system packages, creating new users, defining entrypoints and a lot more. However, this is also typically not needed. Instead, it is very likely that there are already images available that cover your needs. In the following, you will get an idea where you can find suitable images.
Common pitfalls
Before pointing you to locations where and how you can find images and their
corresponding Dockerfiles
, a few words of warning.
You will find several tutorials on building container images online that
target web developers and data scientists who are using generic Javascript
and Python libraries.
For example, for Python development, they will recommend to use a
Python image on Docker Hub
(more on Docker Hub below).
While this makes it easy to obtain a Python version in a software container,
this is not suitable for the majority of applications in HEP.
One of the main reasons is that such images are based on Linux distributions that are (very) different from those used in HEP environments. For example, to be able to read ROOT files on the Grid or EOS, you might rely on XRootD and a Kerberos token or VOMS proxy. Installing and configuring the required packages on Debian or Alpine Linux (the typical distributions used for the Python images), can get very complicated or might even be impossible. Furthermore, since the focus in the example is on the Python version, but not the underlying Linux distribution, it can happen that this distribution changes from one day to the other. To avoid such issues, follow the advice listed below.
CAT-supported frameworks
The CMS Common Analysis Tools (CAT) group supports several
analysis frameworks
with different target groups.
The majority of those frameworks will have a container image that you can
directly use for your analysis.
How to do that is part of the
GitLab CI for CMS lesson.
These images should also be available on CVMFS on unpacked.cern.ch
.
Should those images not be listed in the documentation, you can find them
through the GitLab project:
- On the GitLab project/repository page, click “Deploy” on the left-hand side menu
- Select “Container registry”
On the page that opens, you will see all the available container images and their respective tags.
Other frameworks often also have container images available.
You can find them in a similar way.
A good indication of the availability of a container image for an analysis
framework is the existence of a Dockerfile
in the repository.
There is unfortunately no container image search available for the GitLab
container image registry.
However, you can find a large number of images used at CERN in the
CERN unpacked sync repository.
CMS base images
CMS Offline and Computing provides base images to be used with CMSSW on the grid. The source for these images is in the CMSSW GitHub area, which also links to their Docker Hub location. They are used for the CMSSW Singularity setup.
Another alternative, which typically also has more utility libraries and tools installed already, are the CMS Open Data container images. They often provide a good starting point to build your own images.
CERN base images
If you need to build your own image from scratch, the next-best recommendation is to use the (Alma) Linux images provided by CERN IT. In contrast to bare Alma Linux images, they already have some of the CERN software repositories configured such that installing CERN software should be easier. Mind, however, that these images contain very little packages and you will have to install them yourself.
Docker Hub
Docker Hub is the original container image registry. You will find a huge number of images for all kinds of use cases on it. Make sure to check the build dates of the available images to avoid using very old ones. Furthermore, make sure to check the build logs that are linked in most cases to understand which software is installed into those images. There are often links provided to the Git repositories from which these images have been built.
Using software images from the internet is dangerous
It is highly recommended to only use the official images from Docker Hub. While running a container image on your system is in principle safe since it is isolated from the rest of your computer, this can change as soon as you enter/mount your personal credentials into a running container.
Note also that Docker Hub has pull limits, and it may happen that the pull fails with
Too Many Requests - Server message: toomanyrequests: You have reached your unauthenticated pull rate limit.
To avoid this, you can pull images from Docker Hub through
the pull through cache at CERN.
You can do that by appending registry.cern.ch/docker.io/
to the image name so that:
rootproject/root
becomesregistry.cern.ch/docker.io/rootproject/root
for images with a project namealpine
becomesregistry.cern.ch/docker.io/library/alpine
for images with only one part in the name.
Outside the CERN network, you need to login to the CERN registry.
Other registries
Besides Docker Hub, there are also other container image registries available.
An example is the GitHub Container Registry (images starting with ghcr.io
),
where the image names point to the GitHub repositories from which the container
images have been built.
In those repositories, you will again most likely find the corresponding
Dockerfile
.
Key Points
Container images are available from different registries.
The images can be found in different ways.
There are most likely already images available that work for your analysis.
You can build on top of those images.