Authors: Carole Goble and the WorkflowHub Club, Department of Computer Science
In April 2020 the bioscience community ramped up activity to address the COVID-19 pandemic. This included the large-scale processing of SARS-COV-2 data using computational workflows and scripts for automated data analysis, and for updating public data archives. Workflows are a popular specialist software that links together different codes into tool chains. Workflow systems handle all the complexity of doing that, the data that flows between the codes and how they run on different computing platforms.
Lead by Manchester, partners across Europe in the ELIXIR European Research Infrastructure for Life Science Data and the EC EOSC-Life cluster project launched WorkflowHub. WorkflowHub is an open registry for the community to openly share, find and reuse data processing pipelines and a service of ELIXIR-UK. By September 2020 WorkflowHub was established as a Hub for workflows for any system, any discipline, from anywhere. 82 Teams with members from 71 organisations across Europe, the USA and Australia have contributed 190 high quality workflows (40 COVID-19 related) from 10 different kinds of workflow system. The top workflow has 1550+ downloads. The Hub is a registered service of the European Open Science Cloud. WorkflowHub is an example of a registry for FAIR (findable, accessible, interoperable, reusable) Research Objects.
Applying Open research practices
- WorkflowHub infrastructure promotes the sharing and reuse of Open computational workflows;
- Workflows are linked to other open materials such as data, documents, lab protocols and electronic lab notebooks;
- The workflows are created and updated sustainably by teams who use social channels to connect with users;
- The Hub is one of the first Registries in the world to implement the RO-Crate specification for FAIR Research Objects, which is used to package any kind of research output for exchange between digital systems;
- The Hub is built on the open source FAIRDOM-SEEK platform. Development has been lead by Manchester with partners in the UK, Germany, Belgium, Norway, Netherland and South Africa for over a decade.
- WorkflowHub was built entirely openly by a community of over 50 people who co-created the registry. The development is lead by Manchester and the registry hosted by Manchester.
- WorkflowHub has been developed virtually during the pandemic, its previously planned development accelerated by one year. We adopted an agile development cycle that had buy-in from workflow system providers, workflow developers and users. We built on an open source software platform; operate an open development co-creation practice through a WorkflowHub Club and virtual meetings; use virtual communication channels such as slack, and run virtual hackathons (intense week-long collaborations).
- To help workflow developer buy in, the Hub registers workflows that can “stay home” in their native stores; workflows have DOIs and can be cited, credited and showcased.
Benefits of using these open research practices
- Workflows are openly available for all to find and reuse, sharing data analysis know-how accessible from one place regardless of where the workflow is stored.
- Reproducibility in data analysis is improved.
- Credit and citation for published workflows builds reward and reputation for the developed and can be referenced in articles alongside their test data.
- Open development and virtual hackathon practices meant many people can participate and contribute in a coordinated way regardless of where they were. A core of developers at Manchester forms the kernel of the community.
Open software development practices, using hackathons and virtual communication channels, are powerful ways to enable open research; you still need (a resourced) core of dedicated developers and community workers.