Setting up Rabix executor with a TES server

Overview

The Global Alliance for Genomic Health (GA4GH) Task Execution Schema (TES) is an API for data analysis workflows that can be easily plugged into most existing compute environments, including clusters and clouds. Seven Bridges has been a contributor to the TES specification, and has implemented TES support in the Rabix Executor, allowing authors and users of CWL workflows to send analysis jobs from the Executor to a backend through the TES API. Researchers can use the open-source Rabix Executor and TES API to run CWL workflows on the TES-supported compute infrastructure of their choice from the command line or a web interface. The TES API provides researchers with the flexibility of creating a federated computing environment that leverages the reproducibility value adds of other Seven Bridges projects like Rabix and Common Workflow Language.

Funnel is an implementation of the TES API that we used to tested the Rabix Executor integration with the TES API. A Funnel TES server can be configured to work with different backends (e.g. HPC schedulers or cloud providers). In this simple example, you will install a Funnel TES server locally, but you could also install it on an AWS instance, for example. You will also install the Rabix Executor on your local machine and configure it to send analysis jobs to the TES API. In a typical use case, the Funnel TES Server will be remote from the Rabix Executor, and therefore we show you how to set up shared Google Cloud Storage between them. Once you have set up Funnel, the Executor and a shared storage mechanism, you will be able to execute a CWL workflow from your local terminal using Rabix, which parses workflow steps into computational jobs. These jobs are sent remotely to the TES API installed on the Funnel TES server, which then executes them where the server is installed.

[1] Set up Google storage

Note that if you install the Funnel TES server and Rabix on the same machine you can skip step 1 and just use the config from step 2. c).

  1. If you’ve never used google storage before, you’ll first need to set up an account.
  2. Once you’ve set up an account, you’ll need to create a bucket.
  3. Then, create and download Google Cloud Service account key (or use Google lib for auth):
  4. Download the generated JSON file and export it as GOOGLE_APPLICATION_CREDENTIALS environment variable.

[2] Running a Funnel TES server

Starting a Funnel TES server is the same in local and remote environments, the only difference being the availability of local storage. If you install Funnel on a remote server, you can ignore the parts about local storage.

  1. Download Funnel
  2. Create a config file with storage configured. A sample config can be seen in this GiHtub repo, but a file containing just the following “Storage” snippets will work too.
    1. If using a GS auth key:
      
      {"Storage": {
       "GS":[{"AccountFile":"/path/to/key"}]
      }}
      
      
    2. If using GS client lib:
      
      {"Storage": {
       "GS":[{"FromEnv":true}]
      }}
      
      
    3. If using local storage (both Rabix Executor and Funnel on the same machine)
      
      {"Storage": {
        "Local": {
          "AllowedDirs": [
            "/directories",
            "/accessible/to",
            "/both/bunny/and/funnel"
          ]
      }}}
      
      
  3. Run Funnel in server mode with a -c argument and the path to the created config file:
    
    funnel server -c ./config
    
    

[3] Run a CWL task on Rabix Executor with Funnel

Run a single task:

  1. Get rabix-cli binary executable (version 1+). You can also build one yourself from the github.com/rabix/bunny repo using the mvn install -P all,tes command. There will be a tar.gz archive file containing the binary and config in the rabix-cli/target directory).
  2. Add or change the following line in the properties files in the config directory:
    
    backend.embedded.types=TES
    
    
  3. Execute with TES-specific arguments:
    
    ./rabix-cli dna2protein.cwl.json inputs.json -tes-url=http://localhost:8000 -tes-storage=gs://bucket/tes
    
    

    Where -tes-url is the Funnel location (http://localhost:8000 if started locally and with default configs) and -tes-storage is either a Google Storage URL (starting with the standard gs://) or a path to a local folder. If using a local folder, it needs to be specified in the above mentioned AllowedDirs config for Funnel.

[4] Results

After the TES task is completed successfully, a directory with the standard CWL protocol files will be created in the location specified using tes-storage.


{task id}/root/{app name}/

Where task id is a randomly generated UUID and app name is the textual ID of the specific step in the workflow.

Also, the current status of a TES task can be checked through Funnel’s web dashboard and REST API.

top