::p_load(tmap, sf, performance, knitr,
pacman ggpubr, tidyverse)
In-class Exercise 4: Calibrating Spatial Interaction Models with R
Overview
This in-class exercise is a continuation of Hands-on Exercise 3, In-class Exercise 3 and In-class Exercise 4: Preparing Spatial Interaction Modelling Variables. We will continue our journey of calibrating Spatial Interaction Models by using propulsiveness and attractiveness variables prepared in earlier in-class exercise.
Getting Started
For the purpose of this exercise, five r packages will be used. They are:
sf for importing, integrating, processing and transforming geospatial data.
tidyverse for importing, integrating, wrangling and visualising data.
tmap for plotting cartographicquality thematic maps.
performance for computing model comparison matrices such as rmse.
ggpubr for creating publication quality statistical graphics.
The Data
This exercise is a continuation of Hands-on Exercise 3 and In-class Exercise 4: Preparing Spatial Interaction Modelling Variables. The following data will be used:
flow_data_tidy.rds, weekday morning peak passenger flows at planning subzone level.
mpsz.rds, URA Master Plan 2019 Planning Subzone boundary in simple feature tibble data frame format.
<- read_rds("data/rds/flow_data_tidy.rds") flow_data
glimpse(flow_data)
Rows: 14,734
Columns: 13
$ ORIGIN_SZ <chr> "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMS…
$ DESTIN_SZ <chr> "AMSZ01", "AMSZ02", "AMSZ03", "AMSZ04", "AMSZ05", "AMS…
$ MORNING_PEAK <dbl> 1998, 8289, 8971, 2252, 6136, 2148, 1620, 1925, 1773, …
$ dist <dbl> 50.0000, 810.4491, 1360.9294, 840.4432, 1076.7916, 805…
$ ORIGIN_AGE7_12 <dbl> 310, 310, 310, 310, 310, 310, 310, 310, 310, 310, 310,…
$ ORIGIN_AGE13_24 <dbl> 710, 710, 710, 710, 710, 710, 710, 710, 710, 710, 710,…
$ ORIGIN_AGE25_64 <dbl> 2780, 2780, 2780, 2780, 2780, 2780, 2780, 2780, 2780, …
$ DESTIN_AGE7_12 <dbl> 310.00, 1140.00, 1010.00, 980.00, 810.00, 1050.00, 420…
$ DESTIN_AGE13_24 <dbl> 710.00, 2770.00, 2650.00, 2000.00, 1920.00, 2390.00, 1…
$ DESTIN_AGE25_64 <dbl> 2780.00, 15700.00, 14240.00, 11320.00, 9650.00, 12460.…
$ SCHOOL_COUNT <dbl> 0.99, 2.00, 2.00, 1.00, 3.00, 2.00, 0.99, 0.99, 3.00, …
$ RETAIL_COUNT <dbl> 1.00, 0.99, 6.00, 0.99, 0.99, 0.99, 1.00, 117.00, 0.99…
$ geometry <LINESTRING [m]> LINESTRING (29501.77 39419...., LINESTRING …
Notice that this sf tibble data.frame includes two additional fields namely: SCHOOL_COUNT and BUSINESS_COUNT. Both of them will be used as attractiveness variables when calibrating origin constrained SIM.
The code chunk below is used to display the first five columns and rows of flow_data.
kable(head(flow_data[, 1:5], n = 5))
ORIGIN_SZ | DESTIN_SZ | MORNING_PEAK | dist | ORIGIN_AGE7_12 | geometry |
---|---|---|---|---|---|
AMSZ01 | AMSZ01 | 1998 | 50.0000 | 310 | LINESTRING (29501.77 39419…. |
AMSZ01 | AMSZ02 | 8289 | 810.4491 | 310 | LINESTRING (29501.77 39419…. |
AMSZ01 | AMSZ03 | 8971 | 1360.9294 | 310 | LINESTRING (29501.77 39419…. |
AMSZ01 | AMSZ04 | 2252 | 840.4432 | 310 | LINESTRING (29501.77 39419…. |
AMSZ01 | AMSZ05 | 6136 | 1076.7916 | 310 | LINESTRING (29501.77 39419…. |
Preparing inter-zonal flow data
In general, we will calibrate separate Spatial Interaction Models for inter- and intra-zonal flows. In this hands-on exercise, we will focus our attention on inter-zonal flow. Hence, we need to exclude the intra-zonal flow from flow_data.
First, two new columns called FlowNoIntra and offset will be created by using the code chunk below.
$FlowNoIntra <- ifelse(
flow_data$ORIGIN_SZ == flow_data$DESTIN_SZ,
flow_data0, flow_data$TRIPS)
Error in ans[npos] <- rep(no, length.out = len)[npos]: replacement has length zero
$offset <- ifelse(
flow_data$ORIGIN_SZ == flow_data$DESTIN_SZ,
flow_data0.000001, 1)
According to the syntax used to derive values in FlowNoIntra field, all intra-zonal flow will be given a value of 0 or else the original flow values will be inserted.
Next, inter-zonal flow will be selected from flow_data and save into a new output data.frame called inter_zonal_flow by using the code chunk below.
<- flow_data %>%
inter_zonal_flow filter(FlowNoIntra > 0)
Error in `stopifnot()`:
ℹ In argument: `FlowNoIntra > 0`.
Caused by error:
! object 'FlowNoIntra' not found
You are ready to calibrate the Spatial Interaction Models now.
Calibrating Spatial Interaction Models
In this section, we will focus on calibrating an origin constrained SIM and a doubly constrained by using flow_data prepared. They complement what you have learned in Hands-on Exercise 3.
Origin- (Production-) constrained Model
Figure below shows the general formula of the origin-constrained model.
Code chunk below shows the calibration of the model by using glm()
of R and flow_data.
<- glm(formula = TRIPS ~
orcSIM_Poisson +
ORIGIN_SZ log(SCHOOL_COUNT) +
log(BUSINESS_COUNT) +
log(DIST) - 1,
family = poisson(link = "log"),
data = inter_zonal_flow,
na.action = na.exclude)
Error in eval(mf, parent.frame()): object 'inter_zonal_flow' not found
summary(orcSIM_Poisson)
Error in eval(expr, envir, enclos): object 'orcSIM_Poisson' not found
Goodness of fit
In statistical modelling, the next question we would like to answer is how well the proportion of variance in the dependent variable (i.e. TRIPS) that can be explained by the explanatory variables.
In order to provide answer to this question, R-squared statistics will be used. However, R-squared is not an output of glm()
. Hence we will write a function called CalcRSquared
by using the code chunk below.
<- function(observed, estimated){
CalcRSquared <- cor(observed, estimated)
r <- r^2
R2
R2 }
Now, we can examine how the constraints hold for destinations this time.
CalcRSquared(orcSIM_Poisson$data$TRIPS, orcSIM_Poisson$fitted.values)
Error in eval(expr, envir, enclos): object 'orcSIM_Poisson' not found
With reference to the R-Squared above, we can conclude that the model accounts for about 44% of the variation of flows in the systems. Not bad, but not brilliant either.
Doubly constrained model
In this section, we will fit a doubly constrained SIM by using the general formula shown below:
and the code chunk used is shown below.
<- glm(formula = TRIPS ~
dbcSIM_Poisson +
ORIGIN_SZ +
DESTIN_SZ log(DIST),
family = poisson(link = "log"),
data = inter_zonal_flow,
na.action = na.exclude)
Error in eval(mf, parent.frame()): object 'inter_zonal_flow' not found
summary(dbcSIM_Poisson)
Error in eval(expr, envir, enclos): object 'dbcSIM_Poisson' not found
Next, let us examine how well the proportion of variance in the dependent variable (i.e. TRIPS) that can be explained by the explanatory variables.
CalcRSquared(dbcSIM_Poisson$data$TRIPS,
$fitted.values) dbcSIM_Poisson
Error in eval(expr, envir, enclos): object 'dbcSIM_Poisson' not found
Notice that there is a relatively greater improvement in the R-Squared value.
Model comparison
Statistical measures
Another useful model performance measure for continuous dependent variable is Root Mean Squared Error. In this sub-section, you will learn how to use compare_performance()
of performance package
First of all, let us create a list called model_list by using the code chunk below.
<- list(
model_list Origin_Constrained = orcSIM_Poisson,
Doubly_Constrained = dbcSIM_Poisson)
Error in eval(expr, envir, enclos): object 'orcSIM_Poisson' not found
Next, we will compute the RMSE of all the models in model_list file by using the code chunk below.
compare_performance(model_list,
metrics = "RMSE")
Error in eval(expr, envir, enclos): object 'model_list' not found
The print above reveals that doubly constrained SIM is the best model among the two SIMs because it has the smallest RMSE value of 1906.694.
Visualising fitted values
In this section, you will learn how to visualise the observed values and the fitted values.
Firstly we will extract the fitted values from Origin-constrained Model by using the code chunk below.
<- as.data.frame(orcSIM_Poisson$fitted.values) %>%
df round(digits = 0)
Error in eval(expr, envir, enclos): object 'orcSIM_Poisson' not found
Next, we will append the fitted values into inter_zonal_flow data frame by using the code chunk below.
<- inter_zonal_flow %>%
inter_zonal_flow cbind(df) %>%
rename(orcTRIPS = "orcSIM_Poisson.fitted.values")
Error in eval(expr, envir, enclos): object 'inter_zonal_flow' not found
<- as.data.frame(dbcSIM_Poisson$fitted.values) %>%
df round(digits = 0)
Error in eval(expr, envir, enclos): object 'dbcSIM_Poisson' not found
<- inter_zonal_flow %>%
inter_zonal_flow cbind(df) %>%
rename(dbcTRIPS = "dbcSIM_Poisson.fitted.values")
Error in eval(expr, envir, enclos): object 'inter_zonal_flow' not found
Next, two scatterplots will be created by using geom_point()
and other appropriate functions of ggplot2 package.
<- ggplot(data = inter_zonal_flow,
orc_p aes(x = orcTRIPS,
y = TRIPS)) +
geom_point() +
geom_smooth(method = lm) +
coord_cartesian(xlim=c(0,150000),
ylim=c(0,150000))
Error in eval(expr, envir, enclos): object 'inter_zonal_flow' not found
<- ggplot(data = inter_zonal_flow,
dbc_p aes(x = dbcTRIPS,
y = TRIPS)) +
geom_point() +
geom_smooth(method = lm) +
coord_cartesian(xlim=c(0,150000),
ylim=c(0,150000))
Error in eval(expr, envir, enclos): object 'inter_zonal_flow' not found
Now, we will put all the graphs into a single visual for better comparison by using the code chunk below.
ggarrange(orc_p, dbc_p,
ncol = 2,
nrow = 1)
Error in eval(expr, envir, enclos): object 'orc_p' not found