The Worldwide LHC Computing Grid (WLCG) unites resources from over 160 computing centres and research institutes spread across the world and the number is expected to grow in the coming years. However, provisioning resources (compute, network, storage) at new sites to support WLCG workloads is still no straightforward task and often requires significant assistance from WLCG experts. Recently, the WLCG community has initiated steps towards reducing such overheads, for example, through the use of prefab Docker containers or OpenStack VM images, along with the adoption of popular tools like Puppet for configuration. In 2017, the SIMPLE Grid project was initiated to construct shared community repositories providing such building blocks. These repositories are governed by a single SIMPLE Framework Specification Document which describes a modular way to define site components such as Batch Systems, Compute Elements, Worker Nodes, Networks etc
The SIMPLE Grid project is an extension of the SIMPLE Framework that combines popular configuration management technologies such as Puppet/Ansible and container orchestration technologies such as Docker Swarm/Kubernetes to allow deployment of complex computing clusters using a single site level configuration file. The various components of the framework and their functions are described in the SIMPLE specification document. Two of the core components written in Python are: SIMPLE Grid YAML Compiler and SIMPLE Grid Validation Engine.
The Site Level Configuration File is currently constructed without a well defined schema. Therefore, a Yamale based schema needs to be defined for it. This would require definition of custom data types used by the framework. These data types will be discussed before the start of the project and most of them can be inferred from the frameworkâs specification document. The schemas for component repositories are present in the respective config-schema.yaml files, which need to be appropriately translated to be able to use the Yamale module.
Infrastructure Validation: Once the framework has configured the distributed computing infrastructure, the validation engine needs to ensure that all the services, containers and configuration files are in their expected state. Popular python frameworks such as TestInfra can be used for that.
A more consolidated set of tasks is as follows:
Requirements
Mentors
Links: