January 2024
“Reproducible Analytical Pipelines (RAP) is a cross-government movement promoted by the Government Analysis Function. It is a way of doing analysis so that it meets these principles, making processes more open and robust, enabling better quality assurance, improving knowledge management and business continuity. This ultimately increases the quality and trustworthiness of our analytical publications and facilitates innovation and collaboration within and outside government.”1
We want to look back and be able to repeat our work easily and quickly.
What are the benefits?
Reusable functions
Testing framework
Coding standards
CI/CD (continuous integration and continuous delivery/deployment)
Semantic versioning (MAJOR.MINOR.PATCH e.g. 1.4.1 < 2.0.0)
A number of government articles promoting open code, and what should and should not be published.
Matt Upson wrote the original blog post on RAP and helped develop “the first RAP”.
The Government Digital Service (GDS) Data Science team continued to develop RAP prior to it moving across to the ONS/Analytical Function.
Matt Gregory put together the original RAP companion and an Introduction to RAP online course.
The original blog post on RAP took:
“inspiration from the fields of DevOps and reproducible research”1.
The principles and practices of reproducible research are superbly set out in the Turing Way book, from the Alan Turing Institute.
Goldacre Review talks extensively about RAP.
Government RAP strategy is launched. The vision includes:
“Analytical teams in public sector organisations choose to deliver their analysis using the RAP principles by default.”
Government departments (MOD, DfE, ONS etc.) publish RAP implementation plans.
Government Analysis Function
analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines
NHS Digital
nhsdigital.github.io/rap-community-of-practice
Free book by Bruno Rodrigues
raps-with-r.dev
Some other resources are listed here
scc-pi.github.io/pinsheff/rap.html#further-resources-rap
End to end data flow:
Data model script: ingest -> combine -> clean -> process -> save
Analysis script: load -> analyse -> visualise -> publish / share
Modular functions
GitHub benefits:
Hosting
Collaboration
Track changes Rolling back
Branching to develop & test distinct features
Too many cooks?
we ended up with something too big & too complicated
it’s easier to add code than take it out
solutions: annotate, peer review, be more ruthless
Debugging is hard
A half-termly multi-page report developed by a former colleague:
Challenges:
Existing products:
Disassembly & understanding:
Blueprinting – examination & analysis of components:
Rebuild & New Build:
Data flow (Synapse Analytics)
Power BI doesn’t qualify as RAP
product - pipeline - reproducibility
Use R or Python instead of Stata or SPSS
Version control & SQL
Make use of public RAP resources
Pilot/try a RAP
The building blocks of a RAP:
... are useful in their own right, each will improve the auditability, speed and quality of your work.
Any interest in an SCC RAP User Group, to cover R, Python, SQL, version control etc. ?
Made with revealjs
revealjs.com
Using quarto
quarto.org
Source shared on GitHub
at github.com/scc-pi/rapsheff
Rendered using GitHub Actions
quarto.org/docs/publishing/github-pages.html#github-action
Published on GitHub Pages
at scc-pi.github.io/rapsheff
Open source tools are:
Tracking the three Ws:
Who made Which change and Why?
git
“DevOps is a methodology in the software development and IT industry. Used as a set of practices and tools, DevOps integrates and automates the work of software development (Dev) and IT operations (Ops) as a means for improving and shortening the systems development life cycle.”1
Azure DevOps is a Microsoft product, a suite of tools that enable DevOps.
MLOPs is the application of DevOps to Machine Learning. It is about automating the building, training, deployment, maintenance, and further development of models. RAP covers a broader range of data analysis output.
If we wanted to extend part of the Government RAP strategy vision so that:
“Analytical teams in
public sector organisations choose toSheffield City Council deliver their analysis using the RAP principles by default.”
We would need:
RAP @SCC - Jan ’24