CernVM-FS (CVMFS) is a service for fast and reliable software distribution on a global scale. It is capable of delivering the scientific software used by the High Energy Physics (HEP) community to tens of thousands of client nodes worldwide. Data is organized in repositories that are mounted as a POSIX read-only file system by the clients. Files and metadata are downloaded on-demand by means of HTTP requests and take advantage of several layers of caches.
As of today, distribution and caching are based on a per-file granularity. In some cases, however, it is known upfront that all files of a certain set are required if any of the files is accessed. This is the case, for instance, of well-known Python libraries (e.g., Tensorflow) that upon start always load a certain set of inter-dependent python files. The ability to load all required files together would improve performance in case of cold caches.
The goal of this project is to introduce the concept of bundles, which would improve the startup performance of applications in the following way:
.cvmfsbundles
, list the files that belong to a bundle. The list must contain only regular files and every file can only belong to one bundle at most. The list of files belonging to a bundle is maintained by the repository administrators.bundle_id
) so that when a file is fetched also the corresponding bundle can be identified and downloaded..cvmfsbundles
files to describe the list of regular files part of the same bundleInterested students, please contact the mentors for an evaluation task.