Kavli Affiliate: Elise Jennings
| First 5 Authors: Boris Lublinsky, Elise Jennings, Viktória Spišaková, ,
| Summary:
Many scientific workflows require dedicated compute resources, including HPC
clusters with optimized software, quantum resources, and dedicated hardware
cluster systems like Ray, for example. At the same time, many scientific
workflows today are built on Kubernetes leveraging growing support for workflow
and support tools. To address the growing demand to support workflows on both
cloud and dedicated compute resources we present the Bridge Operator, a
software extension for container orchestration in Kubernetes which facilitates
the submission and monitoring of long running processes on external systems
which have their own cluster resources manager (SLURM, LSF, quantum services
and Ray). The Bridge Operator consists of a custom Kubernetes controller that
employs a Kubernetes Custom Resource Definition to manage applications. We
present controller logic to manage the cloud container orchestration and
external resource workload manager interface, a resource definition to submit
HTTP/HTTPS requests to the external resource, and a controller pod
communicating with the external resource manager to submit and manage job
execution. The implementation allows us to mirror the external resource in
Kubernetes pods, which allows the operator to use these pods as proxies to
control the external system. The implementation is agnostic to the choice of
resource manager but assumes the system exposes a HTTP/HTTPS API for its
control/management. The Bridge Operator automates the role of a human operator
running jobs on a black box external resource as part of a complex hybrid
workflow on the Cloud.
| Search Query: ArXiv Query: search_query=au:”Elise Jennings”&id_list=&start=0&max_results=10