Beyond Basic R - Mapping
Basic mapping in R with the maps and ggmap package
There are many different R packages for dealing with spatial data. The main distinctions between them involve the types of data they work with — raster or vector — and the sophistication of the analyses they can do. Raster data can be thought of as pixels, similar to an image, while vector data consists of points, lines, or polygons. Spatial data manipulation can be quite complex, but creating some basic plots can be done with just a few commands. In this post, we will show simple examples of raster and vector spatial data for plotting a watershed and gage locations, and link to some other more complex examples.
First, let’s download an example shapefile (a polygon) of a HUC8 from western Pennsylvania, using the
sbtools package to access ScienceBase. The
st_read function from the
sf (for “simple features”) package reads in the shapefile. We will be using
sf throughout these examples to manipulate the points and polygon for the gages and HUC. Then we’ll retrieve gages with discharge from this watershed using the
dataRetrieval package. Both
sbtools are covered in our USGS Packages curriculum.
library(sbtools) library(dataRetrieval) library(sf) item_file_download(sb_id = "5a83025ce4b00f54eb32956b", names = "huc8_05010007_example.zip", destinations = "huc8_05010007_example.zip", overwrite_file = TRUE)
##  "huc8_05010007_example.zip"
unzip('huc8_05010007_example.zip', overwrite = TRUE) #note st_read takes the folder, not a particular file huc_poly <- st_read('huc8_05010007_example')
## Reading layer `wbdhu8_alb_simp' from data source `D:\LADData\RCode\owi-blog\content\huc8_05010007_example' using driver `ESRI Shapefile' ## Simple feature collection with 1 feature and 9 fields ## geometry type: POLYGON ## dimension: XY ## bbox: xmin: -79.45512 ymin: 39.91875 xmax: -78.55573 ymax: 40.77377 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs
##  "sf" "data.frame"
## Classes 'sf' and 'data.frame': 1 obs. of 10 variables: ## $ REGION : Factor w/ 1 level "Ohio Region": 1 ## $ SUBREGION : Factor w/ 1 level "Allegheny": 1 ## $ BASIN : Factor w/ 1 level "Allegheny": 1 ## $ SUBBASIN : Factor w/ 1 level "Conemaugh": 1 ## $ HUC_2 : Factor w/ 1 level "05": 1 ## $ HUC_4 : Factor w/ 1 level "0501": 1 ## $ HUC_6 : Factor w/ 1 level "050100": 1 ## $ HUC_8 : Factor w/ 1 level "05010007": 1 ## $ HU_8_STATE: Factor w/ 1 level "PA": 1 ## $ geometry :sfc_POLYGON of length 1; first list element: List of 1 ## ..$ : num [1:256, 1:2] -79 -79 -79 -79 -79 ... ## ..- attr(*, "class")= chr "XY" "POLYGON" "sfg" ## - attr(*, "sf_column")= chr "geometry" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA ## ..- attr(*, "names")= chr "REGION" "SUBREGION" "BASIN" "SUBBASIN" ...
## Geometry set for 1 feature ## geometry type: POLYGON ## dimension: XY ## bbox: xmin: -79.45512 ymin: 39.91875 xmax: -78.55573 ymax: 40.77377 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs
## xmin ymin xmax ymax ## -79.45512 39.91875 -78.55573 40.77377
## Coordinate Reference System: ## EPSG: 4326 ## proj4string: "+proj=longlat +datum=WGS84 +no_defs"
huc_gages <- whatNWISdata(huc = "05010007", parameterCd = "00060", service="uv") head(huc_gages)
## agency_cd site_no station_nm ## 197 USGS 03039925 North Fork Bens Creek at North Fork Reservoir, PA ## 591 USGS 03040000 Stonycreek River at Ferndale, PA ## 621 USGS 03041029 Conemaugh River at Minersville, PA ## 731 USGS 03041500 Conemaugh River at Seward, PA ## 824 USGS 03042000 Blacklick Creek at Josephine, PA ## 964 USGS 03042280 Yellow Creek near Homer City, PA ## site_tp_cd dec_lat_va dec_long_va coord_acy_cd dec_coord_datum_cd ## 197 ST 40.26619 -79.01669 S NAD83 ## 591 ST 40.28563 -78.92058 S NAD83 ## 621 ST 40.34139 -78.92611 S NAD83 ## 731 ST 40.41924 -79.02614 S NAD83 ## 824 ST 40.47701 -79.18670 S NAD83 ## 964 ST 40.57257 -79.10337 S NAD83 ## alt_va alt_acy_va alt_datum_cd huc_cd data_type_cd parm_cd stat_cd ## 197 <NA> <NA> <NA> 05010007 uv 00060 <NA> ## 591 1184.06 .01 NGVD29 05010007 uv 00060 <NA> ## 621 1140 .01 NGVD29 05010007 uv 00060 <NA> ## 731 1076.01 .01 NGVD29 05010007 uv 00060 <NA> ## 824 954.46 .01 NAVD88 05010007 uv 00060 <NA> ## 964 1140 .01 NGVD29 05010007 uv 00060 <NA> ## ts_id loc_web_ds medium_grp_cd parm_grp_cd srs_id access_cd ## 197 221326 <NA> wat <NA> 1645423 0 ## 591 122456 <NA> wat <NA> 1645423 0 ## 621 122464 <NA> wat <NA> 1645423 0 ## 731 122468 <NA> wat <NA> 1645423 0 ## 824 122471 <NA> wat <NA> 1645423 0 ## 964 122477 <NA> wat <NA> 1645423 0 ## begin_date end_date count_nu ## 197 1987-10-03 1998-09-26 4011 ## 591 1991-10-01 2018-08-16 9816 ## 621 2001-12-13 2018-08-16 6090 ## 731 1991-10-01 2018-08-16 9816 ## 824 1991-10-15 2018-08-16 9802 ## 964 1991-10-01 2018-08-16 9816
huc_poly object is a new type of object that we haven’t seen — it has classes
sf as well as
data.frame. Looking inside the object with the
str command, we can see it is structured very much like a
data.frame with several factor columns, except for the
geometry column, which is of type
sf provides various functions to extract useful information from this kind object, generally prefixed with
st_geometry extracts the entire geometry part of the object;
st_bbox extracts the bounding box from the geometry; and
st_crs extracts the coordinate reference system. There are others, but we will use these three to get the parts of the
sf object we need for plotting.
Now that we understand this new object, let’s make some maps.
Raster map example
For the raster map, we will use the
ggmap package to create a road and satellite basemaps for the HUC. Since the basemaps that
ggmap uses are quite detailed, they are too large to include with the package and must be retrieved from the web with the
get_map function. For the
location argument, we are getting the bbox from the
st_bbox returns the bbox in the format we need, except for the names, which we add with
ggmap function is analogous to the
ggplot function in the
ggplot2 package that you have likely already used. It creates the base map, which we can then add things to. Many of the commands used here are from the
ggplot2 package (
ggmap imports them), and others could be used to further customize this map.
ggmap is probably not a good choice if you need your data to be in a particular projection. Compared to base plotting, it provides simplicity at the cost of control.
library(ggmap) bbox <- setNames(st_bbox(huc_poly), c("left", "bottom", "right", "top")) #setting zoom to 9 gives us a bit of padding around the bounding box basemap_streets <- get_map(maptype = "roadmap", location = bbox, zoom = 9) basemap_satellite <- get_map(maptype = "satellite", location = bbox, zoom = 9) street_map <- ggmap(basemap_streets) satellite_map <- ggmap(basemap_satellite) print(street_map)
Now we can start adding to our maps. First, we convert the
huc_gages data.frame to an
sf object using
st_as_sf, assigning it the same coordinate reference system as
ggplot functions like
geom_text add to your base map.
huc_gages_sf <- st_as_sf(huc_gages, coords = c("dec_long_va", "dec_lat_va"), crs = st_crs(huc_poly), remove = FALSE) satellite_map + geom_sf(data = huc_poly, inherit.aes = FALSE, color = "white", fill = NA) + geom_sf(data = huc_gages_sf, inherit.aes = FALSE, color = "red") + geom_text(data = huc_gages_sf, aes(label = site_no, x = dec_long_va, y = dec_lat_va), hjust = 0, size=2.5, nudge_x = 0.02, col = "yellow")
street_map + geom_sf(data = huc_poly, inherit.aes = FALSE, color = "black", fill = NA) + geom_sf(data = huc_gages_sf, inherit.aes = FALSE, color = "red") + geom_text(data = huc_gages_sf, aes(label = site_no, x = dec_long_va, y = dec_lat_va), hjust = 0, size=2.5, nudge_x = 0.02, col = "black")
Vector map example
If we don’t want any raster background to our maps, we can use base plotting and the
maps package. This style of map can be nicer for insets or large scale maps that would be cluttered with a raster background. The
maps package provides easily accessible political boundary maps, that can be overlaid with other shapefiles. As with regular base plotting, you can think of creating maps like painting — every layer has to go on in the right order. You can’t remove things without starting over. Fortunately, you can start over with just a few keystrokes since you are scripting your plot! Run these commands one by one to see the map take shape — first we create a blank state map, then add county lines in white, then the HUC boundary, then the gage points, and then the legend. Note that we use the
st_geometry function inside of the plot command so that we only plot the
huc_gages_sf geometry, and not the other information in their data frames.
library(maps) map(database = 'state', regions = 'Pennsylvania', col = "tan", fill = TRUE, border = NA) #this draws all PA counties since the regions argument uses partial matching map(database = 'county', regions = 'Pennsylvania', col = "white", fill = FALSE, add = TRUE) plot(st_geometry(huc_poly), col = NA, add = TRUE) plot(st_geometry(huc_gages_sf), add = TRUE, col = "red", pch = 19, cex = 0.7) legend("bottomright", legend = c("Gage", "Subbasin boundary"), pch = c(19,NA), lty = c(NA, 1), col = c("red", "black"), title = "Legend") title("Conemaugh Subbasin")
Similarly, we can create a map zoomed in to the HUC polygon. Note that we set the x and y limits of the map by extracting the limits of the
bbox object we created earlier. We can use the names
right, etc. because
bbox is a named vector.
map(database = 'county', regions = 'Pennsylvania', col = "lightgray", xlim = bbox[c('left', 'right')], ylim = bbox[c('bottom', 'top')]) plot(st_geometry(huc_poly), col = "dodgerblue", border = NA, add = TRUE) box() plot(st_geometry(huc_gages_sf), add = TRUE, col = "red", pch = 19, cex = 0.7) legend("bottomleft", legend = c("Gage", "Conemaugh subbasin"), pch = c(19,NA), lty = c(NA, 1), col = c("red", "dodgerblue"), title = "Legend", lwd = c(1,15), bg = "wheat") text(x = huc_gages$dec_long_va, y = huc_gages$dec_lat_va, labels = huc_gages$site_no, pos = 4, cex = 0.7)
Other packages and examples
Like plotting in R, there are endless intricacies to making maps, and we are only really scratching the surface here. Some other packages that you may find useful for certain applications include:
- raster: For working with your own raster data
- sp: The original workhorse package for handling spatial data.
sfis largely replacing it, but you will see it a lot when Googling things.
- geoknife: A USGS package that utilizes the Geo Data Portal for processing gridded data. Covered in the packages curriculum.
- inlmisc: Another USGS package for creating high-level graphics, demonstrated in this post by Jason Fisher.
Also, check out our additional topics in R page for links to some other tutorials.
Beyond Basic R - Version Control with Git
August 24, 2018
Depending on how new you are to software development and/or R programming, you may have heard people mention version control, Git, or GitHub. Version control refers to the idea of tracking changes to files through time and various contributors.
Beyond Basic R - Plotting with ggplot2 and Multiple Plots in One Figure
August 9, 2018
R can create almost any plot imaginable and as with most things in R if you don’t know where to start, try Google. The Introduction to R curriculum summarizes some of the most used plots, but cannot begin to expose people to the breadth of plot options that exist.
Beyond Basic R - Data Munging
August 1, 2018
What we couldn’t cover In the data cleaning portion of our Intro to R class, we cover a variety of common data manipulation tasks. Most of these were achieved using the package dplyr, including removing or retaining certain columns (select), filtering out rows by column condition (filter), creating new columns (mutate), renaming columns (rename), grouping data by categorical variables (group_by), and summarizing data (summarize).
Beyond Basic R - Introduction and Best Practices
July 30, 2018
We queried more than 60 people who have taken the USGS Introduction to R class over the last two years to understand what other skills and techniques are desired, but not covered in the course.
The Hydro Network-Linked Data Index
November 2, 2020
Introduction updated 11-2-2020 after updates described here. The Hydro Network-Linked Data Index (NLDI) is a system that can index data to NHDPlus V2 catchments and offers a search service to discover indexed information.