+4 votes
by
retagged by

2 Answers

+2 votes
by (420 points)

There are many ways creating a reproducible computational workflow. In my eyes it needs to contain the documentation of the code which is used to get the results:

Documentation must be regarded as an integral part of the process of design and coding. A good programming language will encourage and assist the programmer to write clear, self-documenting code, and even perhaps to develop and display a pleasant style of writing. -- Charles Antony Richard Hoare

I focus on a concept called "literate programming" where you have your code along with your description in the same document. You can go on and publish this document since it contains everything that is needed for someone running your analysis.

This concept itself is not bound to any tool or language and allows a big flexibility: Widespread is the JupyterNotebook especially when you are coding with Python or R (at least my common languages together with JupyterNotebooks). I would also suggest taking a look at org-babel which is a module of org-mode (available through Emacs) and highly extendable and flexible to be used with any programming language.There is also an introductory video how to use org-babel.

Of course there are also other concepts that focus on reproducible results like containing all binaries, software and code that are used. But this is not my expertise.

+2 votes
by (190 points)

As lukascbossert already pointed out, one aspect of reproducibility is the understanding of what you have done.

First, let me state:
When trying to reproduce something, you should aim to fully automatize your workflow.

Second:
Stick to the KISS principle.

Some questions arise:

  • What is your "workflow"?
  • Of what kind are the results, you want to reproduce?
    • Do the software versions play a meaningful role?
    • Does hardware play a role?
  • Do you have special requirements in terms of needed programs, packages, CPU-/GPU-power or enormous amounts of data?
  • What other tools are already in place? Is your workflow already/can be scripted with some stable tool like bash-shell (KISS)?
  • What is the desired level and ease of reproducibility?
    Bitwise reproducibilty is nearly(?) impossible.
  • How much time&effort are you willing to invest?
  • Are you aiming to reproduce results in future (e.g. 10 years)?
  • Do you use third-party/closed source software? Do you need licenses (could you need them in future?)? Will you always be able to go back to a certain software version?
One way to approach your problem (based on some assumptions on the answers to the previous questions) would be:
  1. Automate your workflow.
  2. Build a container (e.g. docker/apptainer) with your workflow in it.
  3. Test it does work with GitLab/GitHub CI or on another machine.
  4. Archive and link to each other:
    1. your (documented) code, 
    2. results & intermediary results
    3. your image file (apptainer) or upload to dockerhub
    4. (related scientific paper)
Possibly you could also take a look into dedicated workflow tools like snakemake, but I am no expert in that. My group does not use is, because it is yet-another-tool and not needed in our work ->KISS.
The NFDI4Ing Q&A platform is here to empower researchers in the engineering sciences with a collaborative space to ask and answer questions about their research data management. Whether you're a seasoned expert or just starting out, this platform is designed to foster knowledge exchange and support your research journey.
NFDI4Ing is supported by DFG under project number 442146713
...