This lesson is in the early stages of development (Alpha version)

Reproducible analyses: Glossary

Key Points

Introduction
  • Workflow is the new data.

  • Data + Code + Environment + Workflow = Reproducible Analyses

  • Before reproducibility comes preproducibility

First example
  • Use reana-client rich command-line client to run containerised workflows from your laptop on remote compute clouds

  • Before running analysis remotely, check locally its correctness via validate command

  • As always, when it doubt, use the --help command-line argument

Developing serial workflows
  • Develop workflows progressively; add steps as needed

  • When developing a workflow, stay on the same workspace

  • When developing a bytecode-interpreted code, stay on the same container

  • Use smaller test data before scaling out

  • Use workflows as Continuous Integration; make atomic commits that always work

HiggsToTauTau analysis: serial
  • Writing serial workflows is like chaining shell script commands

Coffee break
  • Refresh your mind

  • Discuss your experience

Developing parallel workflows
  • Computational analysis is a graph of inter-dependent steps

  • Fully declare inputs and outputs for each step

  • Use Scatter/Gather or Map/Reduce to avoid copy-paste coding

HiggsToTauTau analysis: parallel
  • Use step dependencies to express main analysis stages

  • Use scatter-gather paradigm in staged to massively parallelise DAG workflow execution

  • REANA usage scenarios remain the same regardless of workflow language details

A glimpse on advanced topics
  • Workflow specification uses hints to hide implementation complexity

  • Use kerberos: true clause to automatically trigger Kerberos token initalisation

  • Use resources clause to access CVMFS repositories

  • Use compute_backend hint in your workflow steps to dispatch jobs to various HPC/HTC backends

  • Use open/close commands to open and close interactive sessions on your workspace

  • Enable REANA application on GitLab to run long-standing tasks that would time out in GitLab CI

Wrap-up
  • Experiment with containerised workflows to advance scientific reproducibility in your research

Glossary

reproducible analysis

computational workflows