Exploratory spatial data analysis is probably the most important part of any spatial epidemiological analysis. After collecting and organising your data it allows you to readily identify errors, explore relationships between variables and generate hypotheses. In this activity you will use the statistics package R to calculate standardised morbidity ratios and the standard error of the standardised morbidity ratio estimates for the Scottish lip cancer data set. Having done this, you will import the computed values into a Quantum GIS project and join them to a map attribute table allowing you to more fully explore the spatial distribution of lip cancer risk. By the end of this exercise you should be able to:
The data for this exercise can be found in the SCOTLAND subfolder of the DATA folder on the course CD-ROM. Copy the contents of the SCOTLAND folder to the working folder on your computer.
| File name | What is it needed for? |
| SC_districts-BNG.SHP, .SHX, .DBF | ESRI shape file set for Scotland. |
| SC_lip_cancer.XLSX | Lip cancer data set (MS Excel format). |
| SC_lip_cancer.CSV, .CSVT | Lip cancer data set (CSV format). |
| SC_lip_cancer.TXT | Lip cancer data set (TXT format). |
Originally described by IARC (1985) the lip cancer data set provides the following information for each Scottish district:
setwd("D:\\TEMP")
library(rgdal); library(epiR); library(maptools)
Read in the Scottish lip cancer data file. Districts are numbered consecutively from 1 to 56:
sclip.df <- read.table("SC_lip_cancer.csv", header = TRUE, sep = ",")
head(sclip.df)
Read in the Scottish district shape file:
scpol.shp <- readOGR(dsn = getwd(), layer = "SC_districts-BNG")
Bind data from sclip.df to scpol.shp:
scpol.shp$obs <- sclip.df[match(scpol.shp$ID, sclip.df$id),3]
scpol.shp$pop <- sclip.df[match(scpol.shp$ID, sclip.df$id),4]
scpol.shp$pag <- sclip.df[match(scpol.shp$ID, sclip.df$id),5]
For each district, divide the count of lip cancer cases by the vector listing the number of person-years at risk. This provides us with an estimate of the incidence rate of lip cancer in each district (that is, the number of cases of lip cancer per person-year at risk). Calculate the standard error of the incidence rate, the district-level lip cancer standardised mortality ratio and the standard error of the district-level lip cancer standardised mortality ratio:
scpol.shp$inc <- scpol.shp$obs / scpol.shp$pop
scpol.shp$inc.se <- sqrt((scpol.shp$inc * (1 - scpol.shp$inc))/(scpol.shp$pop))
sum.obs <- sum(scpol.shp$obs)
sum.pop <- sum(scpol.shp$pop)
mean <- sum.obs / sum.pop
scpol.shp$exp <- scpol.shp$pop * mean
scpol.shp$smr <- scpol.shp$obs / scpol.shp$exp
scpol.shp$smr.se <- sqrt(scpol.shp$obs) / scpol.shp$exp
Export scpol.shp as a shape file called SC_districts_inc-BNG.SHP:
writeOGR(obj = scpol.shp, dsn = getwd(), layer = "SC_districts_inc-BNG", driver = "ESRI Shapefile")
Finally, write the data as a map file that can be read by Mondrian. Mondrian provides are flexible (operating system independent) platform for dynamic exploratory spatial data analysis:
sp2Mondrian(scpol.shp, file = "SC_districts_inc-BNG.txt")
This creates two files: SC_districts_inc-BNG.TXT and MAP_SC_districts_inc-BNG.txt.
Start Quantum GIS. Add two copies of SC_districts_inc-BNG as layers to a new project. Right click on the layer name of the first copy of SC_districts_inc-BNG and select Properties. Click on the tab titled Style to set the colour scheme of a choropleth map of lip cancer standardised morbidity ratio. Rename the layer to `SMR'. Use the second copy of SC_districts_inc-BNG to show the standard error of the standardised morbidity ratios. Rename this layers `SE_SMR'.
![\TeX \begin{figure}[h]
\centering
\includegraphics[width=15.00cm]{../images/ESDA_01_QGIS.png}
\caption{Choropleth map of lip cancer standardised morbidity ratios.}
\label{fig:ESDA_01}
\end{figure}](../images/ESDA_01_QGIS.png)
Start Mondrian and click on FILE - OPEN. Browse to find the file SC_districts_inc-BNG.TXT. Next click PLOT - MAP. In the main Mondrian window select the variables smr and pag (hold the CTL button down to select two variables at once) then click PLOT - SCATTERPLOT. A scatterplot of smr as a function of prop.ag will be displayed, as shown in Figure 2. Next click PLOT - MAP. Make SMR the variable of interest using a green linear colour scheme. Now we'll do some dynamic exploratory spatial data analysis. With the scatterplot active hold the SHIFT button down and select points in the upper right hand quadrant of the plot (that is, districts with high SMR estimates and relatively high proportions of the population working in outdoor industry, Figure 2). Those districts that you select on the scatterplot will also be selected on the map.
Try the same thing with GeoDa. Start GeoDa and select the SC_districts_inc-BNG shape file when asked to Connect to a Data Source. Create a choropleth map of lip cancer SMR. See -MAP -NATURAL BREAKS MAP, using (to start with) 5 natural breaks. Set SMR as the variable to plot. Select -EXPLORE -BOX PLOT and organise your screen so the window showing the box and whisker plot is next to the window showing the choropleth map of Scotland. Use your mouse to `draw' a box around an area of the box and whisker plot. When sections of the box and whisker plot are selected, the corresponding districts on the choropleth map will be highlighted. Experiment with the other features included in GeoDa (e.g. the cartogram function).
![\TeX \begin{figure}[h]
\centering
\includegraphics[width=15.00cm]{../images/ESDA_02_QGIS.png}
\caption{Choropleth map of lip cancer standardised morbidity ratios in Mondrian.}
\label{fig:ESDA_02}
\end{figure}](../images/ESDA_02_QGIS.png)