Description

CernVM-FS (CVMFS) is a globally-distributed filesystem used to efficiently distribute software to data centers and end-user workstations alike.

Podman is an utility to run and use containers. It provides the same command line interface than Docker but it runs without the need of a privileged daemon. These two characteristics make it extremely interesting for workload used in scientific data-centers.

It has been shown that only a small portion of all the files in a container images is necessary to run the image itself. This is even more accentuated in scientific container images since they usually include complex software stacks comprising hundreds of thousands of files, and often not all the files are needed for each task. Our goal is to merge the lazy load capabilities of CVMFS with the container workflow allowed by podman, to quickly load big scientific container images while maintaining the isolation and convenience of containers.

There is already an integration for Docker, and another for containerd (kubernetes) is about ready. All those implementations are based on the filesystem structure generated by DUCC.

A CernVM-FS file system hosting container images and layers shows the following structure. From a distribution point of view, in CernVM-FS a layer (or image) is a directory containing the unpacked files rather than a single tarball.


/cvmfs/unpacked.cern.ch/
│ 
├─ .layers
│  ├── 00
│  │   ├── 001dba6e0b44ff57a26d944d9a307ef39927e4882b45eb9d3c9257d754ef7d56
│  │   │   └── layerfs
│  │   │       ├── etc
│  │   │       ├── home
│  │   │       └── opt
│  │   └── 008deed8f79c35003fb8808e37c39245e244cd6af7498e5b7874ac7e186c7307
│  │       └── layerfs
│  │           └── code
│  └─ ... many more ...
└─ .flat
   ├── 02
   │   ├── 0212054c85a9b966aa4f9c08048686603c7d0583067b759d14633070fcea30a1
   │   │   ├── bin
   │   │   ├── dev
   │   │   ├── etc
   │   │   ├── home
   │   │   ├── lib
   │   │   └── var
   │   └── 027998886ae41faa55490baeb6b5e37f4295375ac5dcae5bcf3fe91f141687c2
   │       ├── bin
   │       ├── boot
   │       ├── dev
   │       ├── etc
   │       ├── home
   │       ├── lib
   │       ├── lib64
   │       ├── lost+found
   │       ├── media
   │       ├── pool
   │       ├── root
   │       ├── sbin
   │       ├── tmp
   │       ├── usr
   │       └── var
   └─ ... many more ...

The .layer directory store the content of the layers unpacked in an ordinary directory and the .flat directory stores the content of a whole container images, with each layer unpacked one of top of each other.

The project will be mentored from both CERN and Red Hat.

Task Ideas

  1. Enhance DUCC to automatically create filesystem structure in CVMFS that can be used by podman
  2. Allow podman to understand the filesystem structure already present in CVMFS to load container images

Expected Result

Allow podman to run container images directly from CVMFS or any other file system that host directories with the unpacked layer contents.

Evaluation Task

Interested students can contact me (Simone Mosciatti) directly for an evaluation task, it requires basic understanding of containers and FUSE filesystem.

Requirements

The code-base will mostly be in Go(lang), hence it is necessary to know the language. It is also important to have a basic understanding of Linux.

Mentors