An implementation of the Workflow Execution Service API of the Global Alliance for Genomic Health (GA4GH), with the aim to be reliable for production purposes and robust and scalable.
The GA4GH specifies a number of different APIs that aim at the distributed analyses of human data. One of these APIs is the workflow execution service (WES) API, whose aim is to abstract over different workflow engines (like Snakemake or Nextflow) and thus provide a uniform, JSON-based REST API for submitting workflow runs, managing workflow executions, and providing basic execution metadata.
WESkit is an implementation of this API specification. It will serve as a backend to our data management and workflow orchestration system One-Touch Pipeline (OTP). Currently, while WESkit is still in development, it is already in use in at the Sanger Institute in the Tree-of-Life project, where it is used to manage data processing and management workflows. This use case includes that workflow engine processes are executed in a high-throughput cluster (HTC). Currently, WESkit supports IBM LSF and SLURM.
The second use-case of WESkit is as a cloud service. Here WESkit is deployed in a Kubernetes cluster and workflow engine processes are executed in containers in (possibly the same or another) Kubernetes cluster). A Kubernetes backend is current work in progress.
WESkit supports Nextflow and Snakemake and generally adopts adopts a strategy of keeping the interface to the workflow engines small. On the long run, WESkit should also support CWL workflow engines, Cromwell, or other engines. To achieve this flexibility, workflow engines are executed as separate native implementations (i.e. just a snakemake
or nextflow
executable).
In the moment (November 2023), our aim is to finish version 1 of WESkit by March 2023 at a stage when it is usable as backend of our own system at the DKFZ where it will be used to execute and manage the execution of up to 500 or more workflows in parallel.