18 Postcodes & addresses

A common use case in our work is to do some spatial analysis on a dataset with addresses. For example, produce a map of certain incidents over the last year, or summarise the number of incidents by Ward.

18.1 UPRN

TODO

18.2 Postcodes

If there’s no UPRN, getting a location via the postcode is usually the easiest option. It’s not as accurate as a UPRN or a full address, but often it’s accurate enough for what’s required.

Below is an example of using our add_pcd_vars() function that builds on the PostcodesioR package, which itself is a wrapper for the postcodes.io API. You pass it a data frame with a postcode column and it returns the data frame with new columns based on the postcode.

# Create a data frame with some example records
df <- tibble::tribble(
  ~name,    ~postcode,
  "SCC",    "S1 2HH",
  "Blades", "S2 4SU",
  "Owls",   "S6 1SW"
)

# # Load the function we need
# source("https://raw.githubusercontent.com/scc-pi/functions/main/Addresses.R")
# 
# # Add postcode related columns to our data frame
# add_pcd_vars(df,
#              pcd_name = "postcode",
#              .admin_district = FALSE,
#              .lat_long = TRUE,
#              other_vars = c("admin_ward",
#                             "msoa_code")) %>%
#   knitr::kable() #display them in a nice table

admin_district can be used to answer the questions: Is the postcode invalid? Is the postcode in Sheffield? You can anticipate values of NA and Sheffield respectively if the answers are yes. These are common questions so the function has an .admin_district argument.

Needing the coordinates of the postcode centroid is also common, so the function also has a .lat_long argument that if TRUE returns both a latitude column and a longitude column.

TODO: link to sf example & area stats

Other columns are added by including their name in a character vector passed with the other_vars function argument. For example, other_vars=c("ccg_code","lsoa_code"). The list of valid other_vars is defined at docs.ropensci.org/PostcodesioR/#examples.

The pcd_name function argument is used to specify the name of the data frame’s postcode column when it isn’t postcode. For example, pcd_name="pupil_pcode".

Sometimes, the postcode in a dataset is not included as a separate column, but is at the end of a combined address column. Depending on the format and quality of the combined address column you could extract the postcode like this:

# Create a data frame with some example records
df <- tibble::tribble(
  ~name,     ~address,
  "SCC",     "Pinstone St, Sheffield City Centre, Sheffield S1 2HH",
  "Blades",  "Bramall Ln, Highfield, Sheffield S2 4SU",
  "Owls",    "76 Penistone Rd N, Sheffield S6 1SW"
)

# Extract the postcode from the address into a separate column
dplyr::mutate(df, 
              postcode = stringr::word(address, -2, -1, sep = "[:space:]")) |>
  knitr::kable()
name address postcode
SCC Pinstone St, Sheffield City Centre, Sheffield S1 2HH S1 2HH
Blades Bramall Ln, Highfield, Sheffield S2 4SU S2 4SU
Owls 76 Penistone Rd N, Sheffield S6 1SW S6 1SW

18.2.1 PAS postcode lookup

PAS, the Performance & Analysis Service in the People Porfolio, maintains a postcode lookup table that performs the same role as the add_pcd_vars() function.

TODO: Does it also include Sheffield specific variables e.g. which of the 100 neighbourhoods is the postcode in?

18.2.2 Postcode boundaries

TODO: OPEN DATA GB POSTCODE UNIT BOUNDARIES

18.2.3 Split postcodes

TODO

18.3 Addresses

Anne Tetley has provided the following guidance (on Portal):

TODO
- Portal LLPG Cascade Locator
- OS Places ESRI integration