1 Introduction
It is good practice to use version control with data analysis scripts, SQL, R, Python etc. It facilitates re-use, collaboration, and better-quality analysis output. Version control is a key principle of the RAP (Reproducible Analytical Pipeline) approach that has been adopted in the last few years by Government and the NHS. The vision in the Government RAP strategy includes: “analytical teams in public sector organisations choose to deliver their analysis using the RAP principles by default”.
Git is the industry standard open-source distributed version control system. Azure DevOps and GitHub are two Git-based repository hosting services.
In the diagram on the right, the dashed line indicates where files are shared from a data analyst’s laptop to a web service like GitHub. It could be Azure DevOps instead of GitHub. Similarly, it could be a Python script instead of an R script, and a different IDE (Interactive Development Environment) like VS Code instead of RStudio. Stage, commit, push, and pull are all Git commands.
A Show and Tell of version control was provided for data analysts at Sheffield City Council in August 2023. A recording and the slide deck is available from the Council’s Data Network Sharepoint Site.
1.1 Council setup
The intention is for Azure DevOps to be our default for hosting version control repositories. This will enable us to securely share code with data analysts across the Council. Our adoption of Azure DevOps is on hold whilst a new data platform is established.
The diagram on the left is an illustration of the Council’s planned version control setup once both the Data Platform and Azure DevOps are established. It is based on the NHS Digital setup, but they use GitLab in place of Azure DevOps:
How to publish your code in the open - RAP Community of Practice (nhsdigital.github.io)
Making code open and transparent to everyone, not just other Council Officers, is considered best practice. The Government’s Digital Service Standard 12th principle states that all publicly funded code should be open, reusable, and available under appropriate licences. The main benefits of publishing your code are increased transparency, collaboration and knowledge sharing across local authorities, external users, and analysts. Knowing that code will be published will lead to overall code health increasing as analysts will take greater care in ensuring that best coding practices and standards are applied.
Making a repository publicly available will be done by publishing it on GitHub, but only if it passes a fit for publishing checklist.
To understand both the benefits of sharing code and how to manage the risks, we’ll be leaning heavily on resources from:
NHS RAP Community of Practice Government Analysis Function Guidance Hub
1.2 ToDo
- Some form of “Getting Started”, so subsequent content makes sense. Links to other resources and training?
- Once this guidance is moved to Azure DevOps, include Show & Tell URL.
- Re-write chapter when Azure DevOps available.