In-class Exercise 1

Author

QIU RUILIU

Published

November 18, 2023

Modified

November 20, 2023

Tasks

The specific task of this in-class exercise are as follows:

  • to import Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall in to RStudio environment,

  • to import geospatial data in ESRI shapefile format into sf data frame format,

  • to perform data wrangling by using appropriate functions from tidyverse and sf pakcges, and

  • to visualise the distribution of passenger trip by using tmap methods and functions.

Getting started

The code chunk below uses p_load() of pacman package to check if the required packages have been installed on the computer. If they are, the packages will be launched. The packages used are:

  • tmap: for thematic mapping

  • sf: for geospatial data wrangling

  • tidyverse: for non-spatial data wrangling

pacman::p_load(tmap, sf, tidyverse, knitr)

Import Passenger Volume by Origin-Destination Bus Stops

The code chunk below uses the read_csv() function of readr package to import the csv file into R and save it as a R dataframe called odbus.

odbus <- read_csv("data/aspatial/origin_destination_bus_202308.csv")

A quick check of odbus tibble data frame shows that the values in OROGIN_PT_CODE and DESTINATON_PT_CODE are in numeric data type.

glimpse(odbus)
Rows: 5,709,512
Columns: 7
$ YEAR_MONTH          <chr> "2023-08", "2023-08", "2023-08", "2023-08", "2023-…
$ DAY_TYPE            <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 17, 7, 17, 14, 10, 10,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "04168", "04168", "80119", "80119", "44069", "4406…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "1722…
$ TOTAL_TRIPS         <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …

ORIGIN_PT_CODE and DESTINATION_PT_CODE are numeric variables that are categorical in nature. As such, they should be transformed to factor so that R treats them as a grouping variable.

odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE)

Notice that both of them are in factor data type now.

glimpse(odbus)
Rows: 5,709,512
Columns: 7
$ YEAR_MONTH          <chr> "2023-08", "2023-08", "2023-08", "2023-08", "2023-…
$ DAY_TYPE            <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 17, 7, 17, 14, 10, 10,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <fct> 04168, 04168, 80119, 80119, 44069, 44069, 20281, 2…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 17229, 20141, 2…
$ TOTAL_TRIPS         <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …

Extract Commuting Flow data

The code chunk below extracts commuting flows on weekday during the rush hour (7am to 9am).

origin7_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 7 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

It should look similar to the data table below.

kable(head(origin7_9))
ORIGIN_PT_CODE TRIPS
01012 1617
01013 813
01019 1620
01029 2383
01039 2727
01059 1415

We will save the output in rds format for future used.

write_rds(origin7_9, "data/rds/origin7_9.rds")

The code chunk below will be used to import the save origin7_9.rds into R environment.

origin7_9 <- read_rds("data/rds/origin7_9.rds")

Working with Geospatial Data

Geospatial data is adopted to enrich analysis.

Import Bus Stop Locations

The code chunk below uses the st_read() function of sf package to import BusStop shapefile into R as a simple feature data frame called BusStop. As BusStop uses svy21 projected coordinate system, the crs is set to 3414.

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `D:\KathyChiu77\ISSS624\In-class Ex\In-class Ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

The structure of busstop sf tibble data frame should look as below.

glimpse(busstop)
Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC   <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry   <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…

Import Planning Subzone data

The code chunk below uses the st_read() function of sf package to import MPSZ-2019 shapefile into R as a simple feature data frame called mpsz. To ensure we can use mpsz together with BusStopmpsz is reprojected to the svy21 projected coordinate system (crs=3413).

mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `D:\KathyChiu77\ISSS624\In-class Ex\In-class Ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

The structure of mpsz sf tibble data frame should look as below.

glimpse(mpsz)
Rows: 332
Columns: 7
$ SUBZONE_N  <chr> "MARINA EAST", "INSTITUTION HILL", "ROBERTSON QUAY", "JURON…
$ SUBZONE_C  <chr> "MESZ01", "RVSZ05", "SRSZ01", "WISZ01", "MUSZ02", "MPSZ05",…
$ PLN_AREA_N <chr> "MARINA EAST", "RIVER VALLEY", "SINGAPORE RIVER", "WESTERN …
$ PLN_AREA_C <chr> "ME", "RV", "SR", "WI", "MU", "MP", "WI", "WI", "SI", "SI",…
$ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGION", "WEST…
$ REGION_C   <chr> "CR", "CR", "CR", "WR", "CR", "CR", "WR", "WR", "CR", "CR",…
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((33222.98 29..., MULTIPOLYGON (…

Geospatial data wrangling

Combining Busstop and mpsz

Code chunk below populates the planning subzone code (i.e. SUBZONE_C) of mpsz sf data frame into busstop sf data frame.

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C) %>%
  st_drop_geometry()

Before moving to the next step, it is wise to save the output into rds format.

write_rds(busstop_mpsz, "data/rds/busstop_mpsz.csv")  
origin_data <- left_join(origin7_9 , busstop_mpsz,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_SZ = SUBZONE_C)

Before continue, it is a good practice for us to check for duplicating records.

duplicate <- origin_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

If duplicated records are found, the code chunk below will be used to retain the unique records.

origin_data <- unique(origin_data)

It will be a good practice to confirm if the duplicating records issue has been addressed fully.

mpsz_origtrip <- left_join(mpsz, 
                           origin_data,
                           by = c("SUBZONE_C" = "ORIGIN_SZ"))

Choropleth Visualisation

Prepare a choropleth map showing the distribution of passenger trips at planning sub-zone.

tm_shape(mpsz_origtrip)+
  tm_fill("TRIPS", 
          style = "quantile", 
          palette = "Blues",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated at planning sub-zone level",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from Urban Redevelopment Authorithy (URA)\n and Population data from Department of Statistics DOS", 
             position = c("left", "bottom"))