About the Common Workflow Language

Estimated reading time: 9 minutes

Overview

The Common Workflow Language (CWL) is a specification for describing analysis tools and workflows in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing environments. CWL is designed to meet the needs of data-intensive science, such as bioinformatics, medical imaging, astronomy, physics, and chemistry.

CWL is developed by an informal, multi-vendor working group consisting of organizations and individuals aiming to enable scientists to share data analysis workflows. The CWL project is on GitHub and builds on technologies such as JSON-LD for data modeling and Docker for portable runtime environments. 

The Rabix Suite supports both the older CWL version, sbg:draft-2, and the more recent CWL version, v1.0. CWL sbg:draft-2 includes extensions to CWL draft-2 that are specific to Seven Bridges. CWL v1.0 includes most of the Seven Bridges extensions from CWL sbg:draft-2 as part of the standard specification and is fully portable to other CWL v1.0-conformant executors.

Rabix Composer allows you to mix CWL 1.0 and CWL sbg:draft-2 components in the same workflow. Rabix Executor can execute these mixed workflows, but note that many other CWL executors don’t currently support mixed workflows.

Learn more about CWL or read their user guide to start writing your first tool.

About CWL formats and versions

The Rabix Composer workflow editor and tool editor allow you to create either CWL v1.0 or CWL sbg:draft-2, in either JSON or YAML format. When choosing the CWL version and format, you should be aware that:

  • YAML is generally easier to view and edit if you want to work at the code level.
  • The Platform always generates JSON format CWL. You can push a locally-created YAML app to the Platform, but it will be converted to JSON when it is published on the Platform.
  • The Platform only partially supports CWL 1.0 apps at present. You can run CWL 1.0 tools and workflows, but not edit them yet, although Platform support for CWL 1.0 is coming soon. If you want to edit your apps in Rabix Composer and on the Platform, you should choose CWL sbg:draft-2 format for now.
  • A few features that are supported in the earlier CWL sbg:draft-2 have not been included in CWL 1.0. For example, CWL v1.0 does not allow dynamic expressions in base commands when wrapping tools, whereas CWL sbg:draft-2 does. If you would like to include a dynamic expression as part of a base command in CWL v1.0, you should put the dynamic expression in an argument instead as you can include dynamic expressions in arguments.
  • Subject to the restrictions above, both CWL V1.0 and CWL sbg:draft-2 workflows can include a mix of CWL V1.0 and CWL sbg:draft-2 apps, and mixed workflows can be edited in Rabix Composer, executed in Rabix Executor, and executed on the Platform. Note that other CWL executors may not support mixed workflows. If you are creating new tools or workflows, we recommend that you use CWL V1.0 if possible, as this is the current standard and your tools and workflows will execute reliably on all Platforms.

Read more about the differences between CWL sbg:draft-2 and CWL v1.0 below.

CWL implementation in the Rabix Suite

Rabix supports the following two versions of CWL:

sbg:draft-2

sbg:draft-2 is the first implementation of CWL in Rabix. This is essentially the Draft 2 version of the Common Workflow Language, with the addition of several extensions specific to the Seven Bridges execution environment. The extensions were implemented to add the required features that are were not natively supported in the Draft 2 specification of CWL but do present a common use case in bioinformatics analyses.

The following optional features (extensions) were implemented in sbg:draft-2:

  • Resource hints - Define the minimum number of CPU cores and megabytes of RAM required for execution of an app.
  • Stage input - Make inputs available in the tool’s working directory.
  • File metadata - Set metadata values for files produced as outputs of an app.

CWL v1.0

CWL v1.0 is the latest version of CWL and is widely accepted by the CWL community. Since the CWL v1.0 specification natively supports the custom extensions in the sbg:draft-2 CWL version, CWL v1.0 apps are also portable and executable in any other execution environment when using CWL v1.0-conformant executors such as the Rabix Executor.

Learn about CWL v.1.0 improvements over sbg:draft-2.

Extensions in CWL v1.0

When compared to custom extensions in sbg:draft-2 which are listed above, these extensions are dealt with in CWL v1.0 in the following way:

  • Resource hints - Are an integral part of the CWL v1.0 specification (http://www.commonwl.org/v1.0/CommandLineTool.html#ResourceRequirement) and allow you to specify the basic hardware resource requirements. At the moment, supported requirements are number of CPU cores and megabytes of RAM required for execution of an app.
  • Stage input- Implemented as InitialWorkDirRequirement. Solves the use case that used to be handled by the Stage Input extension in sbg:draft-2. The following example illustrates how the use of Stage Input in sbg:draft-2 and InitialWorkDirRequirement in CWL v1.0.

sbg:draft-2:


id: input
type:
  type: array
  items: File
sbg:stageInput: link

CWL v1.0:


inputs:
  input:
    type:
      type: array
      items: File

requirements:
  - class: InlineJavascriptRequirement
  - class: InitialWorkDirRequirement:
    listing:
      - $(inputs.input)

CWL v1.0 support in Rabix

Not all CWL v1.0 features are currently supported in Rabix. Future implementations will address this. The following features are not supported in the current implementation:

  • Document preprocessing is not supported. Code from included external files will not be resolved within the supplied CWL document.
  • Directories are not available as an input type for an app.
  • Instance selection is done based on CPU and memory requirements. Storage space requirements are not taken into consideration when selecting computation instance(s) for a task.
  • File formats are not resolved based on ontology. Since the current CWL v1.0 implementation does not support directories as inputs or outputs, learn how to organize your files and compile them as a tar archive to pass them as an input to your workflow.

Mixed CWL v1.0 and sbg:draft-2 apps

Rabix also supports the execution of workflows containing tools described using CWL v1.0and tools described using sbg:draft-2. Such workflows are either CWL v1.0 workflows that contain sbg:draft-2 tool(s) or sbg:draft-2 workflows that contain CWL v1.0 tool(s). These workflows are currently fully editable and executable in the Rabix Composer.

Key differences between sbg:draft-2 and CWL v1.0

See the table below for an overview of the currently available options for the two CWL versions. Learn more about supported platforms.

Option sbg:draft-2 CWL v1.0 Mixed sbg:draft-2 andCWL v1.0

Can be executed on a supported platform

Editable on a supported platform

Fully portable to other execution environments

Can be added to a supported platform through the API

Can be added to a supported platform through the visual interface

Can be added to a supported platform via Rabix Composer

Can be edited in Rabix Composer

Resources