Computational reproducibility

What is computational reproducibility?

Computational reproducibility is the principle that results produced with research software, such as data analysis, can be reproduced using the same methods and input data. This relies on workflows which are transparent and produce results consistently.

Key factors

Factors which can lead to issues with computational reproducibility include:

Subtle differences in workflows, such as processes for excluding outliers, can lead to results which cannot be reproduced, even by the original experimenter.
Differences in computational environments. For example, undocumented dependencies preventing research software from being rerun
A lack of open software and open data best practice, preventing computations from being independently replicated.

Following best practice in research data management and the open sharing of research software are important steps towards ensuring computational reproducibility. Well documented code shared on a platforms such as GitHub will help others to reproduce results.

The use of either virtual machines or software containers can be used to address issues around dependencies, allowing the configuration of a computational environment to be shared alongside research software.

Computational reproducibility

Key factors

Further resources

Computational reproducibility

Tools and techniques