18 Postcodes & addresses
A common use case in our work is to do some spatial analysis on a dataset with addresses. For example, produce a map of certain incidents over the last year, or summarise the number of incidents by Ward.
18.2 Postcodes
If there’s no UPRN, getting a location via the postcode is usually the easiest option. It’s not as accurate as a UPRN or a full address, but often it’s accurate enough for what’s required.
Below is an example of using our add_pcd_vars()
function that builds on the PostcodesioR package, which itself is a wrapper for the postcodes.io API. You pass it a data frame with a postcode column and it returns the data frame with new columns based on the postcode.
# Create a data frame with some example records
df <- tibble::tribble(
~name, ~postcode,
"SCC", "S1 2HH",
"Blades", "S2 4SU",
"Owls", "S6 1SW"
)
# # Load the function we need
# source("https://raw.githubusercontent.com/scc-pi/functions/main/Addresses.R")
#
# # Add postcode related columns to our data frame
# add_pcd_vars(df,
# pcd_name = "postcode",
# .admin_district = FALSE,
# .lat_long = TRUE,
# other_vars = c("admin_ward",
# "msoa_code")) %>%
# knitr::kable() #display them in a nice table
admin_district
can be used to answer the questions: Is the postcode invalid? Is the postcode in Sheffield? You can anticipate values of NA
and Sheffield
respectively if the answers are yes. These are common questions so the function has an .admin_district
argument.
Needing the coordinates of the postcode centroid is also common, so the function also has a .lat_long
argument that if TRUE
returns both a latitude column and a longitude column.
TODO: link to sf example & area stats
Other columns are added by including their name in a character vector passed with the other_vars
function argument. For example, other_vars=c("ccg_code","lsoa_code")
. The list of valid other_vars
is defined at docs.ropensci.org/PostcodesioR/#examples.
The pcd_name
function argument is used to specify the name of the data frame’s postcode column when it isn’t postcode
. For example, pcd_name="pupil_pcode"
.
Sometimes, the postcode in a dataset is not included as a separate column, but is at the end of a combined address column. Depending on the format and quality of the combined address column you could extract the postcode like this:
# Create a data frame with some example records
df <- tibble::tribble(
~name, ~address,
"SCC", "Pinstone St, Sheffield City Centre, Sheffield S1 2HH",
"Blades", "Bramall Ln, Highfield, Sheffield S2 4SU",
"Owls", "76 Penistone Rd N, Sheffield S6 1SW"
)
# Extract the postcode from the address into a separate column
dplyr::mutate(df,
postcode = stringr::word(address, -2, -1, sep = "[:space:]")) |>
knitr::kable()
name | address | postcode |
---|---|---|
SCC | Pinstone St, Sheffield City Centre, Sheffield S1 2HH | S1 2HH |
Blades | Bramall Ln, Highfield, Sheffield S2 4SU | S2 4SU |
Owls | 76 Penistone Rd N, Sheffield S6 1SW | S6 1SW |
18.2.1 PAS postcode lookup
PAS, the Performance & Analysis Service in the People Porfolio, maintains a postcode lookup table that performs the same role as the add_pcd_vars()
function.
TODO: Does it also include Sheffield specific variables e.g. which of the 100 neighbourhoods is the postcode in?