Introduction
|
Workflow is the new data.
Data + Code + Environment + Workflow = Reproducible Analyses
Before reproducibility comes preproducibility
|
First example
|
Use reana-client rich command-line client to run containerised workflows from your laptop on remote compute clouds
Before running analysis remotely, check locally its correctness via validate command
As always, when it doubt, use the --help command-line argument
|
Developing serial workflows
|
Develop workflows progressively; add steps as needed
When developing a workflow, stay on the same workspace
When developing a bytecode-interpreted code, stay on the same container
Use smaller test data before scaling out
Use workflows as Continuous Integration; make atomic commits that always work
|
HiggsToTauTau analysis: serial
|
|
Coffee break
|
Refresh your mind
Discuss your experience
|
Developing parallel workflows
|
Computational analysis is a graph of inter-dependent steps
Fully declare inputs and outputs for each step
Use Scatter/Gather or Map/Reduce to avoid copy-paste coding
|
HiggsToTauTau analysis: parallel
|
Use step dependencies to express main analysis stages
Use scatter-gather paradigm in staged to massively parallelise DAG workflow execution
REANA usage scenarios remain the same regardless of workflow language details
|
A glimpse on advanced topics
|
Workflow specification uses hints to hide implementation complexity
Use kerberos: true clause to automatically trigger Kerberos token initalisation
Use resources clause to access CVMFS repositories
Use compute_backend hint in your workflow steps to dispatch jobs to various HPC/HTC backends
Use open/close commands to open and close interactive sessions on your workspace
Enable REANA application on GitLab to run long-standing tasks that would time out in GitLab CI
|
Wrap-up
|
|