building height
: This data only covered the city centre (two 10kmx10km grid areas: NS56, NS66). This spatial extent was not suitable to cover the entire glasgow city boundary.N44-N47
, N54-N57
, N64-N67
, N74-N77
, and N84-N87
Data Zones
(LSOA) and Immediate Zones
(MSOA).library(tidyverse) # data wrangling and plotting
library(sf) # Manipulating spatial files
library(tmap) # Dealing with maps
library(spgwr) # GWRs
As the data were all pre-cleaned, the csv files were all put into the same column key called Name
.
strava <- read_csv("Cleaned Files/strava.csv") # response
green <- read_csv("Cleaned Files/green.csv") # predictors
ptai <- read_csv("Cleaned Files/ptai.csv") # predictors
buildings <- read_csv("Cleaned Files/buildings.csv") # predictors
shp <- read_sf("Cleaned Files/Glasgow_IZ.shp") # shapefile
# merge all
strava %>%
left_join(green, by = "Name") %>%
left_join(ptai, by = "Name") %>%
left_join(buildings, by = "Name") -> glasgow_df
The Figure@ref(fig:correlation) shows the correlation between Strava 2017 and Strava 2018. The correlation is `0.9975779`, which means it is nearly identical - not what I expected!
plot(glasgow_df$ride17, glasgow_df$ride18)
Strava 2017 vs Strava 2018
cor(glasgow_df$ride17, glasgow_df$ride18)
## [1] 0.9975779
To get an immediate understanding of the variables, the best thing is to map the variables and get summary tables. First, let us look at the summary of the variables Table@ref(tab:summary-table).
glasgow_df %>% summary()
## Name ride17 ride18 green
## Length:136 Min. : 1760 Min. : 2275 Min. : 1.320
## Class :character 1st Qu.: 32800 1st Qu.: 33606 1st Qu.: 5.780
## Mode :character Median : 79292 Median : 81615 Median : 8.068
## Mean : 128697 Mean : 135287 Mean : 8.837
## 3rd Qu.: 153040 3rd Qu.: 168255 3rd Qu.:11.670
## Max. :1312075 Max. :1421505 Max. :20.664
## PTAI height
## Min. : 245.8 Min. : 5.667
## 1st Qu.: 542.8 1st Qu.: 7.358
## Median : 808.5 Median : 8.098
## Mean :1022.2 Mean : 9.029
## 3rd Qu.:1213.8 3rd Qu.: 9.566
## Max. :4990.8 Max. :21.133
Here we transform the data to a longer format glasgow_df_long
using the pivot_longer
function. This is to directly execute ggplot with facet wrapping. After the data transformation, we then merge the shapefile with the integrated data frame that is gl_shp
.
glasgow_df %>%
rename(Strava2017 = ride17,
Strava2018 = ride18) %>%
pivot_longer(!Name,
names_to = "Type",
values_to = "Value") -> glasgow_df_long
shp %>%
left_join(glasgow_df_long, by = "Name") -> gl_shp
Lets look at the strava data first. Here we see that during 2017 and 2018 people including the City Centre South, Laurieston and Tradeston, City Centre East, Finnieston and Kelvinhaugh, and Calton and Gallowgate were identified as the most reported areas.
# Strava Users
gl_shp %>%
filter(Type %in% c("Strava2017", "Strava2018")) %>%
mutate(Value2 = cut(Value,
breaks = c(0, 50000, 100000, 150000, 200000, 300000, +Inf),
labels = c("0-50", "50-100", "100-150", "150-200", "200-30", ">300"))) %>%
tm_shape() +
tm_polygons("Value2", title = "Strava ('000s)", palette="-RdBu") +
tm_facets(by = "Type", free.coords = F, free.scales = F, ncol = 2) -> gl_strava
gl_strava
The figures below show that the per cent of the greenness (by Immediate Zones) gradually tends to decrease as it goes outside the city centre. The average is 8% across the whole area but the lowest is situated in the city centre and the city south.
The height of the buildings were concentrated around the city centre. The City Centre South was the highest at 21.1% followed by City Centre East and City Centre West.
PTAI (Public Transport Availability Indicators) also tend to more clustered in the city centre (>3000) and around the major bus routes (>2000) while the north and the east were relatively lower (<1000).
# Other variables
gl_shp %>%
filter(Type %in% c("green", "PTAI", "height")) %>%
tm_shape() +
tm_polygons("Value", title = "", palette="-RdBu") +
tm_facets(by = "Type", free.coords = F, free.scales = T, ncol = 3) +
tm_layout(legend.position = c("right", "top"),
title.position = c('right', 'top')) -> gl_variable
gl_variable
model17 <- lm(log(ride17) ~ green + PTAI + height, data = glasgow_df)
summary(model17)
##
## Call:
## lm(formula = log(ride17) ~ green + PTAI + height, data = glasgow_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.78035 -0.59604 0.08002 0.72427 1.76984
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.4216433 0.4704823 22.151 < 2e-16 ***
## green -0.0656807 0.0232332 -2.827 0.00543 **
## PTAI 0.0003435 0.0001738 1.976 0.05021 .
## height 0.1055714 0.0503079 2.099 0.03777 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.01 on 132 degrees of freedom
## Multiple R-squared: 0.3153, Adjusted R-squared: 0.2997
## F-statistic: 20.26 on 3 and 132 DF, p-value: 7.342e-11
residuals(model17)
## 1 2 3 4 5 6
## 0.135919942 0.670526943 1.304312489 0.644729034 1.310050040 -2.170126101
## 7 8 9 10 11 12
## -0.616559434 0.186769318 -0.992017762 0.395836924 0.382950246 -0.237382366
## 13 14 15 16 17 18
## 0.261310638 0.746033800 -0.028220730 0.104845381 1.243101580 -0.191884841
## 19 20 21 22 23 24
## 0.717018818 1.250844748 -1.012700849 0.603650453 0.650193984 0.641932450
## 25 26 27 28 29 30
## 0.417989745 -1.270145620 -1.663419395 -0.461354261 0.107555031 -2.372796941
## 31 32 33 34 35 36
## -0.037508281 -0.074528380 -0.359106739 0.868173407 -0.144176683 1.327318558
## 37 38 39 40 41 42
## 0.775932529 0.568731380 -0.259055184 1.149408266 -0.485250263 0.216155911
## 43 44 45 46 47 48
## -0.767564993 -0.796870162 -0.520444416 -0.250865175 -2.780353678 -1.024349128
## 49 50 51 52 53 54
## 1.754635369 -2.435174049 -1.395130000 -0.586207215 1.287896849 0.037091493
## 55 56 57 58 59 60
## 0.088020682 0.391203952 -0.238610053 -0.276728072 0.459131120 -2.667679226
## 61 62 63 64 65 66
## -2.536940688 -0.591850870 1.335127075 -0.065398596 -1.274398121 -1.343570940
## 67 68 69 70 71 72
## 0.390423472 -0.608590419 0.843423877 1.431849596 1.267265991 -0.197921445
## 73 74 75 76 77 78
## 0.813957256 0.815116030 -0.808387981 -0.200014442 1.190940720 1.076184237
## 79 80 81 82 83 84
## -1.043505139 -0.511448027 -0.711094619 -0.956795760 -0.983510977 1.677010192
## 85 86 87 88 89 90
## 0.001904391 1.769843643 1.341548964 -0.396210381 -1.800924705 0.030742474
## 91 92 93 94 95 96
## 1.425707450 -0.195669084 0.254578002 0.640165410 0.991057030 0.207139287
## 97 98 99 100 101 102
## -0.514297919 0.586451723 -1.257783873 -1.106506884 1.242315480 -1.444023674
## 103 104 105 106 107 108
## -0.625451113 -1.198823994 -0.865268566 1.546974649 0.450987252 1.163708165
## 109 110 111 112 113 114
## -0.342306469 0.089096060 0.493365219 1.072947660 0.489509480 0.464487910
## 115 116 117 118 119 120
## 1.133359294 -0.670206320 0.072023334 -0.489271064 -0.725716621 -1.413417010
## 121 122 123 124 125 126
## 1.182463268 -0.196395703 0.052870528 0.673666748 -0.083751979 0.190938856
## 127 128 129 130 131 132
## 1.476760642 -1.962769408 0.764908618 0.190832993 0.864277834 -0.404320866
## 133 134 135 136
## 0.194651293 0.916943145 -0.324950757 0.450908084
exp(coef(model17)["green"])
## green
## 0.9364298
exp(coef(model17)["PTAI"])
## PTAI
## 1.000344
exp(coef(model17)["height"])
## height
## 1.111345
model18 <- lm(log(ride18) ~ green + PTAI + height, data = glasgow_df)
summary(model18)
##
## Call:
## lm(formula = log(ride18) ~ green + PTAI + height, data = glasgow_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.64293 -0.68402 0.06562 0.70549 1.76134
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.2875367 0.4715255 21.818 <2e-16 ***
## green -0.0576020 0.0232847 -2.474 0.0146 *
## PTAI 0.0003667 0.0001742 2.105 0.0372 *
## height 0.1134229 0.0504195 2.250 0.0261 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.012 on 132 degrees of freedom
## Multiple R-squared: 0.3204, Adjusted R-squared: 0.305
## F-statistic: 20.75 on 3 and 132 DF, p-value: 4.485e-11
residuals(model18)
## 1 2 3 4 5 6
## 0.14269549 0.70133233 1.25228045 0.57175425 1.32685895 -2.06094239
## 7 8 9 10 11 12
## -0.69197530 0.27568227 -0.91100410 0.53839889 0.48091551 -0.22270837
## 13 14 15 16 17 18
## 0.34883614 0.72936131 -0.02824141 -0.08431092 1.22492455 -0.16192034
## 19 20 21 22 23 24
## 0.69529060 1.25668680 -1.11264819 0.61570390 0.66997249 0.67447194
## 25 26 27 28 29 30
## 0.39601051 -1.36664472 -1.64117100 -0.52012037 0.26263258 -2.48349256
## 31 32 33 34 35 36
## -0.12959215 -0.14479454 -0.39948072 0.85085124 -0.04477760 1.29672144
## 37 38 39 40 41 42
## 0.79802979 0.58854589 -0.20556453 1.15890213 -0.58598927 0.33485400
## 43 44 45 46 47 48
## -0.70670609 -0.72872088 -0.62026026 -0.30106989 -2.64293199 -1.01550880
## 49 50 51 52 53 54
## 1.75092521 -2.13491868 -1.16803803 -0.83349604 1.40045222 0.09665470
## 55 56 57 58 59 60
## 0.05602764 0.47714511 -0.21347598 -0.16992831 0.47016156 -2.18988703
## 61 62 63 64 65 66
## -2.07668593 -0.41256081 1.32300157 -0.17461704 -1.17608792 -1.57761833
## 67 68 69 70 71 72
## 0.49000214 -0.46384759 0.87540220 1.42239081 1.25604181 -0.17645867
## 73 74 75 76 77 78
## 0.71795108 0.76202870 -0.90171341 -0.27773839 1.17915340 1.03685111
## 79 80 81 82 83 84
## -1.26220185 -0.86914869 -0.72584218 -0.83264698 -0.88685603 1.72108964
## 85 86 87 88 89 90
## -0.08281366 1.76133757 1.34409734 -0.19948692 -2.19123284 0.03364852
## 91 92 93 94 95 96
## 1.43134382 -0.14909499 0.18651506 0.82565358 0.98438225 0.14966778
## 97 98 99 100 101 102
## -0.46244627 0.55842696 -1.40142898 -1.31669698 1.12401910 -1.53919326
## 103 104 105 106 107 108
## -0.68136626 -1.07743307 -0.89341158 1.55339901 0.48170046 1.09049743
## 109 110 111 112 113 114
## -0.38345752 0.21119879 0.36696025 1.01608276 0.64728469 0.60464044
## 115 116 117 118 119 120
## 1.09255862 -0.76782073 0.22620488 -0.48588075 -0.78941882 -1.76100279
## 121 122 123 124 125 126
## 1.10745587 -0.25255957 -0.20474737 0.50112109 -0.04163603 0.07522218
## 127 128 129 130 131 132
## 1.62153263 -2.30606518 0.67548535 0.22161371 1.00915950 -0.34409869
## 133 134 135 136
## 0.17304265 0.95836567 -0.21875499 0.62478122
exp(coef(model18)["green"])
## green
## 0.9440256
exp(coef(model18)["PTAI"])
## PTAI
## 1.000367
exp(coef(model18)["height"])
## height
## 1.120105
shp %>%
left_join(glasgow_df, by = "Name") %>%
bind_cols(
tibble(Residuals18 = residuals(model18),
Residuals17 = residuals(model17))) -> glasgow_gwr
plot(glasgow_gwr["Residuals17"]) # randomly distributed
plot(glasgow_gwr["Residuals18"])
glasgow_sp <- as(glasgow_gwr, "Spatial")
gwr.bandwidth <-gwr.sel(log(ride17) ~ green + PTAI + height, glasgow_sp) #estimated optimal bandwidth
## Bandwidth: 8649.304 CV score: 142.7482
## Bandwidth: 13980.9 CV score: 142.8954
## Bandwidth: 5354.199 CV score: 138.9338
## Bandwidth: 3317.712 CV score: 127.933
## Bandwidth: 2059.094 CV score: 115.1526
## Bandwidth: 1281.226 CV score: 116.9604
## Bandwidth: 1859.807 CV score: 113.5978
## Bandwidth: 1736.559 CV score: 113.0917
## Bandwidth: 1562.637 CV score: 113.305
## Bandwidth: 1683.767 CV score: 113.0271
## Bandwidth: 1679.924 CV score: 113.0265
## Bandwidth: 1677.647 CV score: 113.0264
## Bandwidth: 1677.809 CV score: 113.0264
## Bandwidth: 1677.813 CV score: 113.0264
## Bandwidth: 1677.813 CV score: 113.0264
## Bandwidth: 1677.812 CV score: 113.0264
## Bandwidth: 1677.811 CV score: 113.0264
## Bandwidth: 1677.812 CV score: 113.0264
## Bandwidth: 1677.812 CV score: 113.0264
## Bandwidth: 1677.812 CV score: 113.0264
## Bandwidth: 1677.812 CV score: 113.0264
gwr.bandwidth
## [1] 1677.812
gwr.fit1<-gwr(log(ride17) ~ green + PTAI + height,
data = glasgow_sp,
bandwidth = gwr.bandwidth,
se.fit=T,
hatmatrix=T)
## Warning in proj4string(data): CRS object has comment, which is lost in output; in tests, see
## https://cran.r-project.org/web/packages/sp/vignettes/CRS_warnings.html
gwr.fit1
## Call:
## gwr(formula = log(ride17) ~ green + PTAI + height, data = glasgow_sp,
## bandwidth = gwr.bandwidth, hatmatrix = T, se.fit = T)
## Kernel function: gwr.Gauss
## Fixed bandwidth: 1677.812
## Summary of GWR coefficient estimates at data points:
## Min. 1st Qu. Median 3rd Qu. Max.
## X.Intercept. 6.98465774 9.73352953 10.77907541 11.49108338 14.03347332
## green -0.23998838 -0.07149230 -0.02740574 0.01209931 0.13576117
## PTAI -0.00083649 0.00015632 0.00040764 0.00058934 0.00451235
## height -0.54505680 -0.02654695 0.05253565 0.12528696 0.27437353
## Global
## X.Intercept. 10.4216
## green -0.0657
## PTAI 0.0003
## height 0.1056
## Number of data points: 136
## Effective number of parameters (residual: 2traceS - traceS'S): 42.71778
## Effective degrees of freedom (residual: 2traceS - traceS'S): 93.28222
## Sigma (residual: 2traceS - traceS'S): 0.8286924
## Effective number of parameters (model: traceS): 31.87474
## Effective degrees of freedom (model: traceS): 104.1253
## Sigma (model: traceS): 0.7843587
## Sigma (ML): 0.6863148
## AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): 371.1238
## AIC (GWR p. 96, eq. 4.22): 315.4401
## Residual sum of squares: 64.0598
## Quasi-global R2: 0.6741362
results <-as.data.frame(gwr.fit1$SDF)
names(results)
## [1] "sum.w" "X.Intercept." "green"
## [4] "PTAI" "height" "X.Intercept._se"
## [7] "green_se" "PTAI_se" "height_se"
## [10] "gwr.e" "pred" "pred.se"
## [13] "localR2" "X.Intercept._se_EDF" "green_se_EDF"
## [16] "PTAI_se_EDF" "height_se_EDF" "pred.se.1"
glasgow_gwr %>%
select(-c(green, PTAI, height)) %>%
bind_cols(results) -> gwr_results
qtm(gwr_results, fill = "localR2")
qtm(gwr_results, fill = "green")
## Variable(s) "green" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
qtm(gwr_results, fill = "PTAI")
## Variable(s) "PTAI" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
qtm(gwr_results, fill = "height")
## Variable(s) "height" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.