Home Page - http://dmcglinn.github.io/quant_methods/ GitHub Repo - https://github.com/dmcglinn/quant_methods ### Source Code Link https://raw.githubusercontent.com/dmcglinn/quant_methods/gh-pages/lessons/shapefiles_and_rasters.R
This lesson covers how to map and work with geospatial data in R. First let’s load the relevant libraries
# install.packages(c("maps","sf", "raster", "ggplot2", "leaflet"))
library(maps) # convenient pkg for maps of the world, state, and county
library(sf) # spatial features package that is helpful working with polygons
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
library(raster) # raster data class for working with grid data
## Loading required package: sp
library(ggplot2) # has some mapping functions (geom_sf)
library(leaflet) # for interactive maps
First, make some maps. The “maps” package has databases=“world”,“usa”,“state”, or “county”. (There are also a few for foreign countries like France and Italy.) In each database, you can plot any region or group of regions.
world <- map(database="world")
names(world)
## [1] "x" "y" "range" "names"
head(world$names)
## [1] "Aruba" "Afghanistan" "Angola" "Angola:Cabinda"
## [5] "Anguilla" "Albania"
Names in world database can all be plot as a region. For example:
map(database="world", regions=c("Cambodia", "Thailand", "Vietnam", "Laos"))
map(database="state", regions=c("virginia", "north carolina", "south carolina"))
You can add cities using map.cities()
function. Note: if
the map is getting cropped on the edges close and clear the plotting
window using dev.off()
then manually widen the plotting
window and try the map()
function again - it should not
crop the map if the window is big enough.
Let’s look at a county map:
nc_county <- map(database="county",regions="north carolina")
head(nc_county$names)
## [1] "north carolina,alamance" "north carolina,alexander"
## [3] "north carolina,alleghany" "north carolina,anson"
## [5] "north carolina,ashe" "north carolina,avery"
triangle <- map(database="county",
regions=c("north carolina,orange","north carolina,wake","north carolina,durham"),
fill=T, plot=F)
triangle$names
## [1] "north carolina,durham" "north carolina,orange" "north carolina,wake"
We can turn a map into a spatial polygons object and fill it with data, turning it into a spatial polygon dataframe.
tri_sf <- st_as_sf(triangle)
plot(tri_sf, axes=T)
add data to make spatial polygons data frame
population <- c(266132, 132272, 892409)
tri_sf$pop <- population
plot(tri_sf['pop'])
We can also use ggplot to produce a similar graphic with less hideous default colors
ggplot(tri_sf) +
geom_sf(aes(fill = pop))
If your data are in different projections, you need to change the projection so that they are all in the same coordinate reference system.
world <- map(database="world", fill=T, plot=F)
world_longlat <- st_as_sf(world)
# Use function spTransform to transform to Mercator projection
world_merc <- st_transform(world_longlat, crs=st_crs("+proj=merc"))
# Lambert Azimuthal Equal Area
world_laea <- st_transform(world_longlat, crs=st_crs("+proj=laea"))
# sinusoidal
world_sinusoidal <- st_transform(world_longlat,
crs=st_crs("+proj=sinu"))
plot the four together to see the difference. Mercator projection distorts area far from the equator. LAEA is accurately represents area but not angles. Sinusoidal is equal area and conserves distances along parallels.
par(mfrow=c(1,1))
plot(world_longlat)
plot(world_merc)
plot(world_laea)
plot(world_sinusoidal)
A four page cheat sheet about coordinate reference systems (CRSs), including projections, datums, and coordinate systems, and the use of these in R: https://www.nceas.ucsb.edu/sites/default/files/2020-04/OverviewCoordinateReferenceSystems.pdf A fun projection link: http://xkcd.com/977/
OK, some slightly more complicated stuff. Data from MODIS fire detentions in the US in 2021. metadata and data link for MODIS fire data available at: https://fsapps.nwcg.gov/afm/data/fireptdata/modisfire_2021_conus.htm
To download the data use:
download.file('https://fsapps.nwcg.gov/afm/data/fireptdata/modis_fire_2021_365_conus_shapefile.zip',
destfile = './data/modis_fire_2021_365_conus_shapefile.zip')
unzip('./data/modis_fire_2021_365_conus_shapefile.zip', exdir = './data/')
read in data
fire2021 <- sf::st_read(dsn = './data/modis_fire_2021_365_conus.shp')
## Reading layer `modis_fire_2021_365_conus' from data source
## `/home/mcglinndj/quant_methods/data/modis_fire_2021_365_conus.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 242079 features and 16 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -124.6386 ymin: 24.726 xmax: -67.388 ymax: 49.72
## Geodetic CRS: NAD83
the shape file is read in as a data.frame with spatial attributes
class(fire2021)
## [1] "sf" "data.frame"
names(fire2021) # provides names of data table
## [1] "AREA" "PERIMETER" "FIRE_" "FIRE_ID" "LAT" "LONG"
## [7] "DATE" "JULIAN" "GMT" "TEMP" "SPIX" "TPIX"
## [13] "SRC" "SAT_SRC" "CONF" "FRP" "geometry"
dim(fire2021) # dimensions of data table (1 row per spatial feature)
## [1] 242079 17
st_crs(fire2021) # retrieves coordinate reference system
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
make a map of Julian day of each fire
plot(fire2021["JULIAN"], pch = '.')
alternatively use ggplot (a bit slower option in this case)
ggplot(fire2021) +
geom_sf(aes(col = JULIAN), pch = '.') +
theme_minimal()
get summary statistics for each attribute just like we would for a dataframe
summary(fire2021)
## AREA PERIMETER FIRE_ FIRE_ID LAT
## Min. :0 Min. :0 Min. : 1 Min. : 124 Min. :19.44
## 1st Qu.:0 1st Qu.:0 1st Qu.: 60520 1st Qu.:259314 1st Qu.:34.35
## Median :0 Median :0 Median :121040 Median :462191 Median :40.01
## Mean :0 Mean :0 Mean :121040 Mean :480743 Mean :39.15
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:181560 3rd Qu.:750844 3rd Qu.:43.23
## Max. :0 Max. :0 Max. :242079 Max. :922322 Max. :49.72
## LONG DATE JULIAN GMT
## Min. :-124.64 Min. :2021-01-01 Min. : 1.0 Min. : 241
## 1st Qu.:-120.57 1st Qu.:2021-04-26 1st Qu.:116.0 1st Qu.: 954
## Median :-113.13 Median :2021-07-27 Median :208.0 Median :1817
## Mean :-106.60 Mean :2021-07-09 Mean :190.5 Mean :1518
## 3rd Qu.: -93.12 3rd Qu.:2021-08-30 3rd Qu.:242.0 3rd Qu.:1930
## Max. : -67.39 Max. :2021-12-31 Max. :365.0 Max. :2214
## TEMP SPIX TPIX SRC
## Min. :300.0 Min. :1.000 Min. :1.000 Length:242079
## 1st Qu.:312.5 1st Qu.:1.100 1st Qu.:1.000 Class :character
## Median :321.5 Median :1.300 Median :1.100 Mode :character
## Mean :329.1 Mean :1.621 Mean :1.216
## 3rd Qu.:337.1 3rd Qu.:1.900 3rd Qu.:1.300
## Max. :508.8 Max. :4.800 Max. :2.000
## SAT_SRC CONF FRP geometry
## Length:242079 Min. : 0.00 Min. : 0.00 POINT :242079
## Class :character 1st Qu.: 55.00 1st Qu.: 14.50 epsg:4269 : 0
## Mode :character Median : 73.00 Median : 30.40 +proj=long...: 0
## Mean : 70.99 Mean : 92.17
## 3rd Qu.: 93.00 3rd Qu.: 72.10
## Max. :100.00 Max. :10433.30
We can also easily subset the object. Let’s select fires that occurred in the last 6 months (days 182 to 365) of 2021
fire <- subset(fire2021, JULIAN > 182)
class(fire)
## [1] "sf" "data.frame"
names(fire)
## [1] "AREA" "PERIMETER" "FIRE_" "FIRE_ID" "LAT" "LONG"
## [7] "DATE" "JULIAN" "GMT" "TEMP" "SPIX" "TPIX"
## [13] "SRC" "SAT_SRC" "CONF" "FRP" "geometry"
dim(fire)
## [1] 159875 17
summary(fire)
## AREA PERIMETER FIRE_ FIRE_ID LAT
## Min. :0 Min. :0 Min. : 3 Min. :164409 Min. :19.44
## 1st Qu.:0 1st Qu.:0 1st Qu.: 44674 1st Qu.:373536 1st Qu.:37.86
## Median :0 Median :0 Median : 94868 Median :649520 Median :40.78
## Mean :0 Mean :0 Mean :100282 Mean :586477 Mean :40.88
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:146410 3rd Qu.:821217 3rd Qu.:45.61
## Max. :0 Max. :0 Max. :242078 Max. :922322 Max. :49.46
## LONG DATE JULIAN GMT
## Min. :-124.64 Min. :2021-07-02 Min. :183.0 Min. : 245
## 1st Qu.:-121.18 1st Qu.:2021-07-29 1st Qu.:210.0 1st Qu.: 629
## Median :-118.85 Median :2021-08-18 Median :230.0 Median :1741
## Mean :-112.31 Mean :2021-08-27 Mean :239.4 Mean :1414
## 3rd Qu.:-107.59 3rd Qu.:2021-09-13 3rd Qu.:256.0 3rd Qu.:1934
## Max. : -67.56 Max. :2021-12-31 Max. :365.0 Max. :2214
## TEMP SPIX TPIX SRC
## Min. :300.0 Min. :1.000 Min. :1.000 Length:159875
## 1st Qu.:313.7 1st Qu.:1.100 1st Qu.:1.000 Class :character
## Median :324.5 Median :1.300 Median :1.100 Mode :character
## Mean :332.5 Mean :1.646 Mean :1.224
## 3rd Qu.:341.6 3rd Qu.:1.900 3rd Qu.:1.300
## Max. :508.8 Max. :4.800 Max. :2.000
## SAT_SRC CONF FRP geometry
## Length:159875 Min. : 0.00 Min. : 0.0 POINT :159875
## Class :character 1st Qu.: 56.00 1st Qu.: 17.2 epsg:4269 : 0
## Mode :character Median : 78.00 Median : 37.1 +proj=long...: 0
## Mean : 73.28 Mean : 113.5
## 3rd Qu.: 99.00 3rd Qu.: 88.6
## Max. :100.00 Max. :10433.3
plot(fire['JULIAN'], cex = 0.25)
It is a little more difficult to do a geographicaly defined subset. For example, let’s select the fires in SC. First we’ll identify which state each fire occurred in, then subset the ones from SC.
USA <- map(database='state', plot=FALSE, fill=TRUE)
names(USA)
## [1] "x" "y" "range" "names"
USA$names
## [1] "alabama" "arizona"
## [3] "arkansas" "california"
## [5] "colorado" "connecticut"
## [7] "delaware" "district of columbia"
## [9] "florida" "georgia"
## [11] "idaho" "illinois"
## [13] "indiana" "iowa"
## [15] "kansas" "kentucky"
## [17] "louisiana" "maine"
## [19] "maryland" "massachusetts:martha's vineyard"
## [21] "massachusetts:main" "massachusetts:nantucket"
## [23] "michigan:north" "michigan:south"
## [25] "minnesota" "mississippi"
## [27] "missouri" "montana"
## [29] "nebraska" "nevada"
## [31] "new hampshire" "new jersey"
## [33] "new mexico" "new york:manhattan"
## [35] "new york:main" "new york:staten island"
## [37] "new york:long island" "north carolina:knotts"
## [39] "north carolina:main" "north carolina:spit"
## [41] "north dakota" "ohio"
## [43] "oklahoma" "oregon"
## [45] "pennsylvania" "rhode island"
## [47] "south carolina" "south dakota"
## [49] "tennessee" "texas"
## [51] "utah" "vermont"
## [53] "virginia:chesapeake" "virginia:chincoteague"
## [55] "virginia:main" "washington:san juan island"
## [57] "washington:lopez island" "washington:orcas island"
## [59] "washington:whidbey island" "washington:main"
## [61] "west virginia" "wisconsin"
## [63] "wyoming"
USA_sf <- st_as_sf(USA)
Let’s make sure projection of USA polygon is same as fire points
USA_sf <- st_transform(USA_sf, st_crs(fire2021))
Now we overlay performs a “point in polygon” operation–meaning that it will return us a vector giving the index of which polygon in USA_sp each point in fire is. We then index that to names(USA_sp) to get the name of the state for that index.
sf_use_s2(FALSE)
## Spherical geometry (s2) switched off
fire_state <- st_intersection(fire2021, USA_sf)
## although coordinates are longitude/latitude, st_intersection assumes that they
## are planar
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
Now let’s subset the ones from SC. We have to use grep because there are actually three polygons for SC.
firesc <- fire_state[grep("south carolina", fire_state$ID), ]
dim(firesc)
## [1] 3247 18
We’ll setup a legend for the map of fires by temperature.
addLegendToSFPlot <- function(values = c(0, 1), labels = c("Low", "High"),
palette = c("blue", "red"), ...){
# Get the axis limits and calculate size
axisLimits <- par()$usr
xLength <- axisLimits[2] - axisLimits[1]
yLength <- axisLimits[4] - axisLimits[3]
# Define the colour palette
colourPalette <- leaflet::colorNumeric(palette, range(values))
# Add the legend
plotrix::color.legend(xl=axisLimits[2] - 0.1*xLength, xr=axisLimits[2],
yb=axisLimits[3], yt=axisLimits[3] + 0.1 * yLength,
legend = labels, rect.col = colourPalette(values),
gradient="y", ...)
}
First make an SC spatial polygon to put around the fire points.
SC <- map(database='state', regions='south carolina', fill=T,
plot=F)
SC_st <- st_as_sf(SC)
SC_st <- st_transform(SC_st, crs=st_crs(firesc))
cuts <- cut(firesc$JULIAN, 10)
colors <- heat.colors(10)[as.numeric(cuts)]
plot(st_geometry(SC_st))
plot(st_geometry(firesc['JULIAN']), add = TRUE,
col = colors, pch = 19, cex = 0.5)
addLegendToSFPlot(values = seq(from = 183, to = 363, length.out = 10),
labels = c("Low", "", "", "", "Medium", "", "", "", "", "High"),
palette = heat.colors(10))
Here’s where ggplot shines as it makes it easy to combine multiple maps
ggplot() +
geom_sf(data = SC_st) + # add in SC polygon
geom_sf(data = firesc, aes(col = JULIAN), cex = 0.25) # add in fire data
For the last step, let’s export this as a KML (readable by google earth) using write OGR and plot the locations of fires in SC in google earth. The first argument is the object we want to export, the second is the filename (by default it will go in our working directory), the layer we want to export, and the file format.
write_sf(firesc, "firesctemp.kml", driver="kml")
Rasters are grids of data. A common data grid to work with is climate data. You can download an Rdata file of bioclim climate data here:
## https://www.dropbox.com/s/gafxazc9575nf3j/bioclim_10m.Rdata?dl=0
let’s load load and plot the data
load('./data/bioclim_10m.Rdata')
bioStack
## class : RasterBrick
## dimensions : 900, 2160, 1944000, 19 (nrow, ncol, ncell, nlayers)
## resolution : 0.1666667, 0.1666667 (x, y)
## extent : -180, 180, -60, 90 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +ellps=WGS84
## source : memory
## names : mat, mdr, iso, tseas, tmax, tmin, tar, twetq, tdryq, twarmq, tcoldq, ap, pwet, pdry, pseas, ...
## min values : -26.9, 0.9, 0.8, 7.2, -5.9, -54.7, 5.3, -25.1, -45.0, -9.7, -48.8, 0.0, 0.0, 0.0, 0.0, ...
## max values : 31.4, 21.1, 9.5, 2267.3, 48.9, 25.8, 72.5, 37.5, 36.4, 38.0, 28.9, 9916.0, 2088.0, 652.0, 261.0, ...
class(bioStack)
## [1] "RasterBrick"
## attr(,"package")
## [1] "raster"
names(bioStack)
## [1] "mat" "mdr" "iso" "tseas" "tmax" "tmin" "tar" "twetq"
## [9] "tdryq" "twarmq" "tcoldq" "ap" "pwet" "pdry" "pseas" "pwetq"
## [17] "pdryq" "pwarmq" "pcoldq"
projection(bioStack) # this is unprojected latlong like the fire data
## [1] "+proj=longlat +ellps=WGS84"
plot(bioStack, "mat")
Let’s extract the historical climate data at each of our fire locations
fire_climate <- extract(bioStack, firesc)
class(fire_climate)
## [1] "matrix" "array"
head(fire_climate)
## mat mdr iso tseas tmax tmin tar twetq tdryq twarmq tcoldq ap pwet
## [1,] 15.0 13.7 4.0 738.4 31.6 -2.1 33.7 6.8 15.5 24.3 5.2 1300 131
## [2,] 15.0 13.7 4.0 738.4 31.6 -2.1 33.7 6.8 15.5 24.3 5.2 1300 131
## [3,] 15.1 13.6 4.0 740.1 31.6 -2.1 33.7 6.8 15.6 24.4 5.2 1267 128
## [4,] 15.0 13.7 4.0 738.4 31.6 -2.1 33.7 6.8 15.5 24.3 5.2 1300 131
## [5,] 15.2 13.0 3.9 746.9 31.5 -1.8 33.3 6.8 15.7 24.6 5.3 1209 123
## [6,] 15.3 13.0 3.9 745.1 31.6 -1.7 33.3 7.0 10.6 24.7 5.5 1220 126
## pdry pseas pwetq pdryq pwarmq pcoldq
## [1,] 91 9 346 297 343 319
## [2,] 91 9 346 297 343 319
## [3,] 88 10 338 286 338 312
## [4,] 91 9 346 297 343 319
## [5,] 83 11 328 270 325 303
## [6,] 84 11 336 272 330 308
nrow(fire_climate)
## [1] 3247
merge the two datasets
firesc <- cbind(firesc, fire_climate)
head(firesc)
## Simple feature collection with 6 features and 36 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -81.758 ymin: 35.166 xmax: -81.239 ymax: 35.19
## Geodetic CRS: NAD83
## AREA PERIMETER FIRE_ FIRE_ID LAT LONG DATE JULIAN GMT
## 173259 0 0 173259 160772 35.190 -81.758 2021-06-30 181 1824
## 173260 0 0 173260 160771 35.190 -81.758 2021-06-30 181 1830
## 173303 0 0 173303 71672 35.183 -81.638 2021-04-05 95 1905
## 173316 0 0 173316 631657 35.181 -81.747 2021-07-06 187 1610
## 173409 0 0 173409 544371 35.167 -81.239 2021-04-13 103 1630
## 173418 0 0 173418 544370 35.166 -81.239 2021-04-13 103 1640
## TEMP SPIX TPIX SRC SAT_SRC CONF FRP ID mat mdr iso
## 173259 320.5 1.1 1.0 ssec A 56 11.2 south carolina 15.0 13.7 4.0
## 173260 320.5 1.1 1.0 gsfc A 56 8.6 south carolina 15.0 13.7 4.0
## 173303 308.7 2.1 1.4 gsfc A 49 11.7 south carolina 15.1 13.6 4.0
## 173316 318.9 1.0 1.0 gsfc T 28 6.2 south carolina 15.0 13.7 4.0
## 173409 311.1 1.7 1.3 ssec T 46 24.2 south carolina 15.2 13.0 3.9
## 173418 311.1 1.7 1.3 gsfc T 67 17.1 south carolina 15.3 13.0 3.9
## tseas tmax tmin tar twetq tdryq twarmq tcoldq ap pwet pdry pseas
## 173259 738.4 31.6 -2.1 33.7 6.8 15.5 24.3 5.2 1300 131 91 9
## 173260 738.4 31.6 -2.1 33.7 6.8 15.5 24.3 5.2 1300 131 91 9
## 173303 740.1 31.6 -2.1 33.7 6.8 15.6 24.4 5.2 1267 128 88 10
## 173316 738.4 31.6 -2.1 33.7 6.8 15.5 24.3 5.2 1300 131 91 9
## 173409 746.9 31.5 -1.8 33.3 6.8 15.7 24.6 5.3 1209 123 83 11
## 173418 745.1 31.6 -1.7 33.3 7.0 10.6 24.7 5.5 1220 126 84 11
## pwetq pdryq pwarmq pcoldq geometry
## 173259 346 297 343 319 POINT (-81.758 35.19)
## 173260 346 297 343 319 POINT (-81.758 35.19)
## 173303 338 286 338 312 POINT (-81.638 35.183)
## 173316 346 297 343 319 POINT (-81.747 35.181)
## 173409 328 270 325 303 POINT (-81.239 35.167)
## 173418 336 272 330 308 POINT (-81.239 35.166)
# Fire temperatures in deg K at fire locations
ggplot() +
geom_sf(data = SC_st) +
geom_sf(data = firesc, aes(col = TEMP), cex = 0.25)
# historical annual precip
ggplot() +
geom_sf(data = SC_st) +
geom_sf(data = firesc, aes(col = ap), cex = 0.25)
# relationship between annual precip and temp
plot(TEMP ~ mat, data=firesc, xlab = 'Annual precip', ylab = 'Fire temp (K)')
These maps are cool but they are static. Let’s make an interactive
map using the package leaflet
A quick demo of
leaflet
can be found here: https://rstudio.github.io/leaflet
mapStates <- map("state", fill = TRUE, plot = FALSE)
leaflet(data = mapStates) %>% addTiles() %>%
addPolygons(fillColor = topo.colors(10, alpha = NULL), stroke = FALSE)
we can add various provider tiles (i.e., maps) with additional data features
m <- leaflet(data = firesc) %>% addTiles() %>%
addCircleMarkers(radius = 2, label = ~as.character(firesc$TEMP))
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
m %>% addProviderTiles(providers$Esri.NatGeoWorldMap)
it is possible to vary point radius and color based upon data fields
m <- leaflet(data = firesc) %>% addTiles() %>%
addCircleMarkers(radius = ~(TEMP/max(TEMP)), label = ~as.character(firesc$TEMP))
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
m
m <- leaflet(data = firesc) %>% addTiles() %>%
addCircleMarkers(radius = 2, color = ~TEMP,
label = ~as.character(firesc$TEMP))
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
m