Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Charting 'tidycensus' data with R

This blog shows several different ways to visualize data from the `tidycensus` package for R.

Date Posted June 24, 2025 Last Updated June 24, 2025
Author Althea Archer
Elmera Azadpour
Sharmin Siddiqui
Hayley Corson-Dosch
Reading Time 24 minutes Share

In January, 2025, the organizers of the tidytuesday challenge highlighted data that were featured in a previous blog post and data visualization website . Some of us in the USGS Vizlab wanted to participate by creating a series of data visualizations showing these data, specifically the metric “households lacking plumbing.” This blog highlights our data visualizations inspired by the tidytuesday challenge as well as the code we used to create them, based on our previous software release on GitHub .

Data downloading and preparation

We used functions to download the data from the tidycensus package in R at the county level for 2022 and 2023. The functions are available from an open-access software release on GitHub .

These are the U.S. Census Bureau variables we used in these data visualizations (see full citations below):

  • B01003_001: total population per county
  • B25049_004: households lacking plumbing
  • B25049_002: total housing units (both owner- and renter-occupied)
  • B25049_003: households with plumbing
  • B19013_001: median household income

We also downloaded complete state and county files from the tigris package for mapping purposes.

Code to download the data

# Load libraries
library(tidycensus)
library(sf) 
library(janitor) 
library(tidyverse)
library(rmapshaper)
library(tigris)

# Helper functions -----
#   From Software release: 
#     https://github.com/DOI-USGS/vulnerability-indicators/blob/main/2_process/src/data_utils.R
get_census_data <- function(geography, var_names, year, proj, survey_var) {
  df <- get_acs(
    geography = geography,
    variable = var_names,
    year = year,
    geometry = TRUE,
    survey = survey_var) |>
    clean_names() |>
    st_transform(proj) |>
    mutate(year = year)
  
  return(df) 
}

# Grab relevant variables 
vars <- c("B25049_002", "B25049_004", "B19013_001", "B25049_003", "B01003_001")

# Pull data for 2023 and 2022 for all US counties ------
plumbing_data_2023 <- get_census_data(
  geography = 'county', 
  var_names = vars, 
  year = "2023", 
  proj = "EPSG:5070", 
  survey_var = "acs1"
) |>
  mutate(
    variable_long = case_when(
      variable == "B25049_002" ~ "total_housing",
      variable == "B25049_004" ~ "plumbing",
      variable == "B19013_001" ~ "median_household_income",
      variable == "B25049_003" ~ "units_having_plumbing",
      variable == "B01003_001" ~ "total_pop",
      .default = NA_character_  
    )
  ) |> 
  select(geoid, name, variable_long, estimate, geometry, year) |> 
  pivot_wider(
    names_from = variable_long,
    values_from = estimate
  ) |> 
  mutate(
    percent_lacking_plumbing = (plumbing / total_housing) * 100,
    percent_having_plumbing = (units_having_plumbing / total_housing) * 100
  )

plumbing_data_2022 <- get_census_data(
  geography = 'county', 
  var_names = vars, 
  year = "2022", 
  proj = "EPSG:5070", 
  survey_var = "acs1"
) |>
  mutate(
    variable_long = case_when(
      variable == "B25049_002" ~ "total_housing",
      variable == "B25049_004" ~ "plumbing",
      variable == "B19013_001" ~ "median_household_income",
      variable == "B25049_003" ~ "units_having_plumbing",
      variable == "B01003_001" ~ "total_pop",
      .default = NA_character_  
    )
  ) |> 
  select(geoid, name, variable_long, estimate, geometry, year) |> 
  pivot_wider(
    names_from = variable_long,
    values_from = estimate
  ) |> 
  mutate(
    percent_lacking_plumbing = (plumbing / total_housing) * 100,
    percent_having_plumbing = (units_having_plumbing / total_housing) * 100
  )

# Pull data by the state level

state_plumbing_data_2023 <- get_census_data(
  geography = 'state', 
  var_names = vars, 
  year = "2023", 
  proj = "EPSG:5070", 
  survey_var = "acs1"
) |>
  mutate(
    variable_long = case_when(
      variable == "B25049_002" ~ "total_housing",
      variable == "B25049_004" ~ "plumbing",
      variable == "B19013_001" ~ "median_household_income",
      variable == "B25049_003" ~ "units_having_plumbing",
      variable == "B01003_001" ~ "total_pop",
      .default = NA_character_  
    )
  ) |> 
  select(geoid, name, variable_long, estimate, geometry, year) |> 
  pivot_wider(
    names_from = variable_long,
    values_from = estimate
  ) |> 
  mutate(
    percent_lacking_plumbing = (plumbing / total_housing) * 100,
    percent_having_plumbing = (units_having_plumbing / total_housing) * 100
  )

counties_sf <-   # download a generalized (1:500k) counties file   
  tigris::counties(cb = TRUE) |>   
  # set projection   
  sf::st_transform("EPSG:5070") |>   
  # standardize column names   
  janitor::clean_names() |>     
  # simplify spatial data   
  rmapshaper::ms_simplify(keep = 0.2) |> 
  filter(! stusps %in% c("AK", "HI", "PR", "VI", "MP", "GU", "AS"))

state_sf <- 
  # download a generalized (1:500k) states file
  tigris::states(cb = TRUE) |> 
  # set projection
  st_transform("EPSG:5070") |> 
  # standardize column names
  clean_names() |> 
  # simplify spatial data
  rmapshaper::ms_simplify(keep = 0.2) |> 
  filter(! stusps %in% c("AK", "HI", "PR", "VI", "MP", "GU", "AS"))

Setting up the data visualization framework

We set up a few consistent settings for all of the following data visualizations, such as standard typefaces, colors, and a canvas and margin for plotting with cowplot . See our “Jazz up your ggplot” blog post for more cowplot-related plotting tips and tricks.

Code to set up our data visualizations

library(showtext)
library(sysfonts)
library(grid)
library(magick)
library(cowplot)

# Load custom font
font_legend <- "Source Sans Pro"
sysfonts::font_add_google(font_legend)
showtext::showtext_opts(dpi = 300, regular.wt = 200, bold.wt = 700)
showtext::showtext_auto(enable = TRUE)

# Define colors
background_color = "white"
font_color = "black"

# The background canvas for your viz
canvas <- grid::rectGrob(
  x = 0, y = 0, 
  width = 9, height = 9,
  gp = grid::gpar(fill = background_color, alpha = 1, col = background_color)
)

# margin for plotting
margin = 0.04

# Load in USGS logo (also a white logo available)
usgs_logo <- magick::image_read("usgs_logo_black.png") 

Note: After the cowplot viz are built in the examples below, we then save the visualizations to png using ggsave, such as: ggsave(filename = "xxx.png", width = 9, height = 9, dpi = 300, units = "in", bg = "white")

Line chart example

This line chart is a simple but beautiful way to compare the rates of plumbing access between 2022 and 2023 for California counties. The top 5 counties in 2023 are highlighted with color and labels, which are nice graphing tricks to improve readability. The colors were from the RColorBrewer package and the labels were placed with the ggrepel package.

Line chart showing the percent of households lacking plumbing in California counties from 2022 to 2023. Five counties are highlighted, which had the top percent in 2023, which were Humbolt, Placer, Kings, Mendocino, and Lake County.

Chart by Elmera Azadpour

Code to reproduce this figure

library(RColorBrewer)
library(ggrepel)
library(scales)

# Get CA data only
plumbing_data_2022_ca <- plumbing_data_2022 |> 
  filter(grepl(", California$", name)) 

plumbing_data_2023_ca <- plumbing_data_2023 |> 
  filter(grepl(", California$", name)) 

# Combine 2022 and 2023 data and drop geometry
combined_data <- bind_rows(plumbing_data_2022_ca, plumbing_data_2023_ca) |> 
  st_drop_geometry()

# Pivot to long format for slope chart
long_data_ca <- combined_data |> 
  select(name, year, percent_lacking_plumbing) |> 
  pivot_wider(names_from = year, values_from = percent_lacking_plumbing) |> 
  pivot_longer(cols = c(`2022`, `2023`), names_to = "year", values_to = "percent_lacking_plumbing") |> 
  mutate(name = str_remove(name, ",\\s.*"))

# Grab the top 5 counties in 2023
top_5_names <- long_data_ca |> 
  filter(year == 2023) |> 
  arrange(desc(percent_lacking_plumbing)) |> 
  slice(1:5) |> 
  pull(name)

# Add a variable to distinguish top 5 counties
long_data_ca <- long_data_ca |> 
  mutate(
    top_5_flag = ifelse(name %in% top_5_names, name, "Other")
  ) 

# Generate a color palette for the top 5 and others
top_5_colors <- RColorBrewer::brewer.pal(5, "Set1")  
neutral_color <- "gray80"              # gray color for other counties
palette <- c(setNames(top_5_colors, top_5_names), Other = neutral_color)

# Main plot
(main_plot <- ggplot() +
    # Add gray lines and points first for non-top 5 names
    geom_line(
      data = long_data_ca |> filter(!(name %in% top_5_names)),
      aes(x = year, y = percent_lacking_plumbing, group = name),
      linewidth = 1,
      color = "gray80"
    ) +
    geom_point(
      data = long_data_ca |> filter(!(name %in% top_5_names)),
      aes(x = year, y = percent_lacking_plumbing),
      size = 4,
      color = "gray80"
    ) +
    # Add lines and points for top 5 names
    geom_line(
      data = long_data_ca |> filter(name %in% top_5_names),
      aes(x = year, y = percent_lacking_plumbing, group = name, color = top_5_flag),
      linewidth = 1
    ) +
    geom_point(
      data = long_data_ca |> filter(name %in% top_5_names),
      aes(x = year, y = percent_lacking_plumbing, color = top_5_flag),
      size = 4
    ) +
    # Add labels for top 5 names
    ggrepel::geom_text_repel(
      data = long_data_ca |> filter(year == 2023 & name %in% top_5_names),
      aes(x = year, y = percent_lacking_plumbing, label = name, color = top_5_flag),
      size = 5,
      nudge_x = 0.35,
      show.legend = FALSE
    ) +
    scale_color_manual(values = palette) +
    scale_y_continuous(labels = scales::percent_format(scale = 1)) +
    scale_x_discrete(expand = c(0.08, 0)) +
    labs(
      x = NULL,
      y = NULL
    ) +
    theme_minimal() +
    theme(
      legend.position = "none", 
      axis.text.x = element_text(size = 16, family = font_legend), 
      axis.text.y = element_text(size = 16, family = font_legend), 
      plot.margin = margin(t = 20, r = 20, b = 20, l = 20, unit = "pt")
    )
)

ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot,
            x = margin - 0.03,
            y = margin + 0.065,
            height = 1 - (4*margin),
            width = 1 - (margin/2)) +
  # explainer text (Edit the author name only)
  draw_label("Elmera Azadpour, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend)+
  # Title
  draw_label("Percent Lacking Plumbing in California",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  # Add logo
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)
             

Bubble map example

This bubble map visualizes the change in the percent of households lacking plumbing facilities by county from 2022 to 2023. The dots represent both the direction (color) and magnitude (size) of change in the percent of households lacking plumbing from 2022 to 2023. Labels paired with curved arrows highlight two example counties and provide guidance for how to read the chart.

Map of the lower 48 United States showing the change in the percent of households lacking plumbing from 2022 to 2023. Each county is represented with a bubble, which is scaled to the magnitude of change and colored for whether that county had an increase (red) or decrease (blue) in the percent between the two years. The map does not show any real patterns of areas that had increases versus decreases, with the two colors being well mixed across the U.S.A.

Chart by Elmera Azadpour

Code to reproduce this figure

# Get lower 48 data for 2023 
plumbing_data_2023_df <- plumbing_data_2023 |> 
  st_drop_geometry() |> 
  filter(!str_detect(name, "Hawaii|Alaska|Puerto Rico"))

# Get lower 48 data for 2022
plumbing_data_2022_df <-  plumbing_data_2022 |>
  st_drop_geometry() |> 
  filter(!str_detect(name, "Hawaii|Alaska|Puerto Rico"))

# Join 2023 and 2022 data
merged_data <- full_join(plumbing_data_2023_df, plumbing_data_2022_df, 
                         by = "geoid", 
                         suffix = c("_2023", "_2022"))

merged_data <- merged_data |> 
  # Calculate whether the difference between 
  # percent_lacking_plumbing_2023 and percent_lacking_plumbing_2022 
  # is positive or negative
  mutate(change_direction = ifelse(percent_lacking_plumbing_2023 - 
    percent_lacking_plumbing_2022 >= 0, "Increase", "Decrease"),
    # Calculate the absolute value of the difference between 
    # percent_lacking_plumbing_2023 and percent_lacking_plumbing_2022
    change_magnitude = abs(percent_lacking_plumbing_2023 -
          percent_lacking_plumbing_2022))

merged_data_with_geometry <- plumbing_data_2023 |> 
  # Retain only the geoid and geometry columns
  select(geoid, geometry) |>  
  # Then join back with merge_data
  right_join(merged_data, by = "geoid") 

# Calculate centroids for each geometry
merged_data_with_geometry <- merged_data_with_geometry |> 
  mutate(centroid = st_centroid(geometry))

# Separate centroids for plotting bubbles
centroids <- merged_data_with_geometry |> 
  select(geoid, name_2023,change_direction,change_magnitude, centroid) |> 
  st_as_sf(sf_column_name = "centroid") |> 
  drop_na()

# Edit centroid breaks and labels for legend
centroids <- centroids |> 
  mutate(
    magnitude_category = cut(
      change_magnitude,
      breaks = c(0, 0.1, 0.3, 0.5, Inf), # Four bins
      labels = c("≤0.1", "0.1–0.3", "0.3–0.5", ">0.5"), # Corresponding labels
      include.lowest = TRUE
    )
  )

# Main plot
(main_plot <- ggplot() +
    geom_sf(
      color = "white", 
      linewidth = 0.05 
    ) +  
    geom_sf(
      data = state_sf,
      fill = NA, 
      color = "black",  
      linewidth = 0.2, 
      linetype = "solid" 
    ) +  
    geom_sf(
      data = counties_sf, 
      fill = NA, 
      color = "grey70", 
      linewidth = 0.1, 
      linetype = "solid" 
    ) +
    geom_sf(
      data = centroids,
      aes(
        geometry = centroid,
        size = magnitude_category, 
        color = change_direction
      ),
      alpha = 0.6
    ) +
    scale_color_manual(
      values = c("Increase" = "#B24425", "Decrease" = "#176966"),
      name = "Direction"
    )  +
    scale_size_manual(
      name = "Magnitude",
      values = c(2, 4, 6, 8), # adjust bubble sizes to match categories
      labels = c("≤0.1", "0.1–0.3", "0.3–0.5", ">0.5") 
    ) +
    theme_void() 
)
# Custom arrows for callouts
plot_tx <- ggplot() +
  theme_void() +
  geom_curve(data =  data.frame(x = 13, y = 5),
             aes(x = 13, y = 5,
                 xend = 12, yend = 8),
             arrow = grid::arrow(length = unit(0.5, 'lines')), 
             curvature = -0.2,  
             angle = 90, 
             ncp = 10,   
             color = 'black')

plot_nm <- ggplot() +
  theme_void() +
  geom_curve(data = data.frame(x = 13, y = 5),
             aes(x = 13, y = 5,
                 xend = 14, yend = 8), 
             arrow = grid::arrow(length = unit(0.5, 'lines')), 
             curvature = 0.2, 
             angle = 90, 
             ncp = 10, 
             color = 'black')

# Custom colored horizontal lines to highlight increase/decrease in labels
increase_line <- ggplot() +
  theme_void() +
  geom_segment(aes(x = 0, y = 0, xend = 2, yend = 0), 
               color = "#B24425", 
               size = 1,
               alpha = 0.6) 

decrease_line <- ggplot() +
  theme_void() +
  geom_segment(aes(x = 0, y = 0, xend = 2, yend = 0), 
               color = "#176966", 
               size = 1,
               alpha = 0.6) 

# Customize legend a bit more
legends <- cowplot::get_plot_component(
  main_plot +
    theme(legend.direction = "horizontal") +
    guides(color = guide_legend(override.aes = list(size = 8),
                                order = 1),
           size = guide_legend(override.aes = list(fill = "gray60",
                               color = "black",
                               alpha = 0.5)),
           order = 2) +
    theme(legend.title = element_text(family = font_legend, face = "bold", size = 16),
          legend.text = element_text(family = font_legend, size = 14)),
  "guide-box", return_all = TRUE)

ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot +  theme(legend.position = "none"),
            x = margin - 0.06,
            y = margin + 0.02,
            height = 1 - (5*margin),
            width = 1.05) +
  # Add legends
  draw_plot(legends[[1]], .442, .72, .115, .15)  +
  # explainer text (Edit the author name only)
  draw_label("Elmera Azadpour, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",           x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend)+
  # Title
  draw_label("Change in Percent of Households Lacking\nPlumbing Facilities by County (2023–2022)",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)+
  # Add arrow for Orange County, Texas (increase) 
  draw_plot(plot_tx,
            x = margin + 0.52,
            y = margin + 0.14,
            height = margin + 0.03,
            width = margin + 0.02) +
  draw_label("Orange County, TX\nexperienced a 2.3%\n increase in households\n lacking plumbing facilities",
             fontfamily = font_legend,
             x =  margin + 0.66,   
             y = margin + 0.11,
             size = 12,
             color = "black",
             lineheight = 1.1) + 
  # "Underline" increase text
  draw_plot(increase_line,
            x = margin + 0.57,
            y = margin + 0.051,
            height = margin + 0.03,
            width = margin + 0.032) +
  # Add arrow for McKinley County, New Mexico (decrease)
  draw_plot(plot_nm,
            x = margin + 0.18,
            y = margin + 0.235,
            height = margin + 0.08,
            width = margin + 0.03) +
  draw_label("McKinley County, NM\nexperienced a 0.93%\n decrease in households\n lacking plumbing facilities",
             fontfamily = font_legend,
             x =  margin + 0.09,   
             y = margin + 0.2,
             size = 12,
             color = "black",
             lineheight = 1.1) + 
  # "Underline" decrease text
  draw_plot(decrease_line,
            x = margin - 0.001,
            y = margin + 0.141,
            height = margin + 0.03,
            width = margin + 0.034)

Cartogram example

Here, a cartogram distorts the scale of the states so that those with a large number of households lacking plumbing are larger than they otherwise would based on area alone, creating a warped look to familiar features. The color ramp also represents the households lacking plumbing in 2023, with darker colors in states with higher levels. There is an inset map with geographically-accurate states to provide a comparison against the warped shapes of the cartogram. The package cartogram was used to create this map.

Map of the lower 48 United States showing the total number of households that lack plumbing by state in 2023. The main map is a cartogram, so that the size of the state is distorted and scaled by the total population. The states are also colored with blues that get darker the more households lack plumbing. This makes Arizona, for example, larger than it normally would be based on size alone. There is also an inset map that shows the same underlying data as a state map without distortion, but with the same blue color ramp as the cartogram.
Code to reproduce this figure
library(cartogram)

# select only the conterminuous (lower 48) states and add USPS code
conus <- state_plumbing_data_2023 |> 
  mutate(stusps = state.abb[match(name, state.name)]) |>
  filter(! stusps %in% c("AK", "AS", "HI", "PR", "VI", "MP", "GU"),
         ! name %in% "Puerto Rico")

# Use the cartogram package to create a continuous cartogram, 
#   weighted by households lacking plumbing
carto_state <- cartogram::cartogram_cont(conus,
                                         weight = "plumbing")

# Plot the cartogram, with legend
(cartogram_plot <- ggplot(carto_state) + 
    geom_sf(aes(fill = plumbing)) +
    scico::scale_fill_scico(palette = "oslo", direction = -1) +
    labs(fill = "Households lacking plumbing") +
    theme_void() +
    theme(legend.position = "bottom",
          legend.key.width=unit(2, "cm")))

# Plot an inset map that is not distorted
(conus_plot <- ggplot(conus) + 
    geom_sf(aes(fill = plumbing)) +
    scico::scale_fill_scico(palette = "oslo", direction = -1) +
    theme_void() +
    theme(legend.position = "none"))

ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(cartogram_plot,
            x = 0.025, 
            y = 0.15,
            width = 0.95,
            height = 0.7) +  
  # the inset plot 
  draw_plot(conus_plot,
            x = 0.06, 
            y = 0.18,
            width = 0.2,
            height = 0.25) +  
  # explainer text (Edit the author name only)
  draw_label("Althea Archer, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend) +
  # Title
  draw_label("The lack of access to plumbing in 2023",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  draw_label("Over 200,000 households lack access to plumbing in the lower 48 United States.",
             x = margin,   
             y = 1 - 0.1,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)

Geofaceted area plot example

Geofacets can be used to create a series of mini charts that are aligned to be consistent with geographic features. This geofacet has a mini-area plot for each state that represents the number of households with plumbing in 2022 compared to 2023. The blue areas are households from all the counties that had a decrease in the number of households from 2022 to 2023, and the red areas are the households from counties with an increase. The package we used to arrange mini charts like this was the geofacet package.

Map showing one mini-area chart for each state of the U.S.A. The area charts show the total number of households lacking plumbing in 2022 versus 2023. The counties that comprise the area chart that saw an increase in the number of households lacking plumbing are colored red, whereas the others are colored blue.
Code to reproduce this figure
library(geofacet)

# Join 2022 and 2023 data together
joined <- plumbing_data_2022 |>
  sf::st_drop_geometry() |>
  select(-year) |>
  rename_at(vars(-name, -geoid), function(x) paste0(x, "_22")) |>
  inner_join(plumbing_data_2023 |> 
              sf::st_drop_geometry() |> 
              select(-name, -year) |>
              rename_at(vars(-geoid), function(x) paste0(x, "_23")),
            by = "geoid") |>
  # calculate trends from 2022 to 2023
  mutate(n_change = plumbing_23 - plumbing_22,
         pct_change = percent_lacking_plumbing_23 - percent_lacking_plumbing_22,
         n_change_cat = case_when(n_change > 0 ~ "Increasing",
                                  n_change == 0 ~ "No Change",
                                  n_change < 0 ~ "Decreasing")) |>
  mutate(state_id = substr(geoid, 1, 2))

# get grid for geofacet from the Flow Tiles open software repository
#  https://github.com/DOI-USGS/flow-tiles/blob/3f871a7ca393bd26f2a8e15a3fbd5621eb02e8f1/src/plot_cartogram.R#L2
usa_grid <- readr::read_csv("usa_grid.csv")|> 
  filter(! code %in% c("PR", "HI", "AK"))

# Join data with the full counties dataset
joined_all <- joined |>
  left_join(counties_sf |> sf::st_drop_geometry() |> select(stusps, state_name, geoid), 
            by = "geoid") 

# back to long for plotting
long_join <- joined_all |>
  select(-name, -state_id) |>
  select(n_change_cat, stusps, state_name, geoid, plumbing_22, plumbing_23) |> 
  pivot_longer(cols = c(plumbing_22, plumbing_23),
               names_to = "year",
               values_to = "households")

# Summarise by state for plotting
state_summary <- long_join |> 
  filter(! is.na(stusps), stusps != "DC") |>
  group_by(stusps, n_change_cat, year) |>
  summarise(households = sum(households, na.rm = TRUE)) |>
  mutate(year_num = case_when(year == "plumbing_22" ~ 2022,
                              year == "plumbing_23" ~ 2023)) |>
  mutate(cat_factor = case_when(n_change_cat == "Increasing" ~ "C",
                                n_change_cat == "Decreasing" ~ "A",
                                TRUE ~ "B"))

(main_plot <- ggplot(data = state_summary,
                     aes(y = households, x = year_num, fill = cat_factor)) +
    geom_area() + 
    scale_fill_manual(values = c("#538CC6", "#538CC6", "#872341")) +
    scale_x_continuous(breaks = c(2022, 2023), labels = c("2022", "2023"),
                       name = "Comparison from 2022 (left) to 2023 (right)") +
    scale_y_continuous(breaks = c(0, 10000, 20000), labels = c("0k", "10k", "20k"),
                       name = "Houses without plumbing") + 
    geofacet::facet_geo( ~ stusps, grid = usa_grid, move_axes = TRUE) +
    theme_minimal() +
    theme(legend.position = "none",
          panel.background = element_blank(),
          panel.grid = element_blank(),
          axis.text.y = element_text(size = 7),
          axis.text.x = element_blank(),
          axis.title = element_text(size = 10)))

ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot,
            x = 0.025, 
            y = 0.15,
            width = 0.95,
            height = 0.7) +  
  # explainer text (Edit the author name only)
  draw_label("Althea Archer, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend) +
  # Title
  draw_label("Change in households with plumbing 2022-2023",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  draw_label("Counties in each state are colored based on if they had an",
             x = margin,   
             y = 1 - 0.1,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  draw_label("increase or decrease in the number of households lacking plumbing.",
             x = margin,   
             y = 1 - 0.12,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  # "Underline" text
  draw_line(x = c(0.04, 0.10),
            y = c(1 - 0.14, 1 - 0.14),
            color = "#872341",
            linewidth = 1.4) +
  draw_line(x = c(0.14, 0.20),
            y = c(1 - 0.14, 1 - 0.14),
            color = "#538CC6",
            linewidth = 1.4) +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)

Rainfall plot example

This plot uses a paired half-rainfall and half-density plot to show the variation in households with plumbing by region. The plot of variation is paired with an inset map to show where those regions are located. Variation can be easily and beautifully plotted with the use of the ggdist package, like we did here.

Rainfall plot showing variation in each county's percent of households lacking plumbing by region, including Southwest, Southeast, South Central, Northwest, Northeast, North Central, and the Midwest.
Code to reproduce this figure

library(ggdist)

# Join county data with water data 
county_df <- counties_sf |>
  sf::st_drop_geometry() |>
  left_join(plumbing_data_2023 |> select(geoid, year, total_housing, plumbing, 
                                       percent_lacking_plumbing),
            by = "geoid") |>
  # add regions
  mutate("CASC" = case_when(state_name %in% c("Minnesota", "Iowa", "Missouri", "Wisconsin", "Illinois", "Indiana", "Michigan", "Ohio") ~ "Midwest", 
                            state_name %in% c("Montana", "Wyoming", "Colorado", "North Dakota", "South Dakota", "Nebraska", "Kansas") ~ "North Central",                           
                            state_name %in% c("Maine", "New Hampshire", "Vermont", "Massachusetts", "Connecticut", "Rhode Island",  "New York", "New Jersey", "Pennsylvania", "Delaware", "Maryland", "West Virginia","Virginia", "Kentucky", "District of Columbia") ~ "Northeast", 
                            state_name %in% c("Washington", "Oregon", "Idaho") ~ "Northwest",                           
                            state_name %in% c("New Mexico", "Texas", "Oklahoma", "Louisiana") ~ "South Central", 
                            state_name %in% c("North Carolina", "South Carolina", "Georgia", "Alabama", "Mississippi", "Florida", "Tennessee",                                     "Arkansas") ~ "Southeast",  
                            state_name %in% c("Arizona", "California", "Utah", "Nevada") ~ "Southwest", TRUE ~ "not sorted")) |>
  # remove missing data
  filter(! is.na(year))

# join regions to state data
state_region_sf <- state_sf |> 
  mutate("CASC" = case_when(name %in% c("Minnesota", "Iowa", "Missouri", "Wisconsin", "Illinois", "Indiana", "Michigan", "Ohio") ~ "Midwest", 
                            name %in% c("Montana", "Wyoming", "Colorado", "North Dakota", "South Dakota", "Nebraska", "Kansas") ~ "North Central",                           
                            name %in% c("Maine", "New Hampshire", "Vermont", "Massachusetts", "Connecticut", "Rhode Island",  "New York", "New Jersey", "Pennsylvania", "Delaware", "Maryland", "West Virginia","Virginia", "Kentucky", "District of Columbia") ~ "Northeast", 
                            name %in% c("Washington", "Oregon", "Idaho") ~ "Northwest",                           
                            name %in% c("New Mexico", "Texas", "Oklahoma", "Louisiana") ~ "South Central", 
                            name %in% c("North Carolina", "South Carolina", "Georgia", "Alabama", "Mississippi", "Florida", "Tennessee",                                     "Arkansas") ~ "Southeast",  
                            name %in% c("Arizona", "California", "Utah", "Nevada") ~ "Southwest", TRUE ~ "not sorted"))

# State map for inset
(map <- ggplot(state_region_sf) +   
    geom_sf(aes(fill = CASC), color = "white", alpha = 0.85) +   
    scale_fill_manual(values = pal_hue()(7))+
    theme_void()+
    theme(legend.position = 'none')
)

(main_plot <- ggplot(county_df,
                     aes(y = CASC, 
                         x = percent_lacking_plumbing,
                         show.legend = FALSE)) +
    ggdist::stat_dotsinterval(fill = 'black',
                              side = "bottom", quantiles = 30)+
    ggdist::stat_halfeye(mapping = aes(fill = CASC))+
    ylab('Climate Adaptation Science Center (CASC) Region')+
    xlab('Percent of households per county lacking plumbing')+
    theme_void()+
    theme(legend.position = 'none', 
          axis.title.x = element_text(size = 14),
          axis.text = element_text(size = 14, hjust = 1)) 
)

ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot,
            x = margin,
            y = 0.170,
            height = 0.775,
            width = 1) +
  ## inset map
  draw_plot(map,
            x = 0.15,
            y = 0.25,
            height = 0.28) +
  # explainer text (Edit the author name only)
  draw_label("Sharmin Siddiqui, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend) +
  # Title
  draw_label("Regional differences in access to plumbing, 2023",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)

Grid chart example

This chart uses an overall grid-layout to compare income versus plumbing access. The chart is divided into counties that saw an increase or decrease in median household income (left versus right) and into counties that saw an increase or decrease in the percent of households with plumbing (top versus bottom). Each county is then shown with a square, scaled to the total population in that county in 2023. The map author was inspired by this Datawrapper visualization.

Chart divided into counties that saw an increase or decrease in median household income (left versus right) and into counties that saw an increase or decrease in the percent of households with plumbing (top versus bottom). Each county is then shown with a square, scaled to the total population in that county in 2023. Coconino County Arizona is called out as having more plumbing access and higher income, whereas Davic County, Utah is called out as having less plumbing access, even with higher income. Cochise County, Arizona is called out as having less plumbing access and lower income.
Code to reproduce this figure
plumbing_df <- plumbing_data_2022 |>
  inner_join(sf::st_drop_geometry(plumbing_data_2023), 
             join_by("geoid", "name"), 
             suffix = c("_2022", "_2023")) |>
  mutate(
    change_in_having_per = 
      percent_having_plumbing_2023 - percent_having_plumbing_2022,
    change_in_income = median_household_income_2023 - median_household_income_2022,
    category = case_when(
      (change_in_income > 0 & change_in_having_per >= 0) ~ "a",
      (change_in_income > 0 & change_in_having_per < 0) ~ "b",
      (change_in_income < 0 & change_in_having_per >= 0) ~ "c",
      TRUE ~ "d"
    )) |>
  filter(!is.na(change_in_having_per))


# get geoids for SW U.S.
focal_geoids <- counties_sf |>
  filter(state_name %in% c("Arizona", "California", "Utah", "Nevada")) |>
  pull(geoid)

# subset data to SW U.S.
data_subset <- plumbing_df |>
  filter(geoid %in% focal_geoids) |>
  arrange(desc(total_pop_2023))

# get data subsets for annotations
# 49011 Davis County, Utah - lower right
# 04003 Cochise County, Arizona - lower left
# 04005 Coconino County, Arizona - upper right
lower_right <- filter(data_subset, geoid == "49011")
lower_left <- filter(data_subset, geoid == "04003")
upper_right <- filter(data_subset, geoid == "04005")
label_size <- 4

main_plot <- ggplot(data_subset) +
  geom_point(
    aes(x = change_in_income, 
        y = change_in_having_per,
        fill = category,
        alpha = 0.2,
        size = total_pop_2023),
    color = "white",
    pch = 22) +
  geom_point(
    data = lower_right,
    aes(x = change_in_income, 
        y = change_in_having_per,
        fill = category,
        size = total_pop_2023),
    color = "black",
    pch = 22
  ) +
  geom_point(
    data = lower_left,
    aes(x = change_in_income, 
        y = change_in_having_per,
        fill = category,
        size = total_pop_2023),
    color = "black",
    pch = 22
  ) +
  geom_point(
    data = upper_right,
    aes(x = change_in_income, 
        y = change_in_having_per,
        fill = category,
        size = total_pop_2023),
    color = "black",
    pch = 22
  ) +
  scale_fill_manual(values = c("steelblue2", "tomato3", "mediumaquamarine", "orange"),
                    aesthetics = c("fill", "color")) +
  scale_x_continuous(expand = c(0.02, 0.02)) +
  scale_y_continuous(expand = c(0.01, 0.01)) +
  scale_size_continuous(range = c(1.5, 15)) +
  theme_void() +
  geom_hline(yintercept = 0, linetype = "dotted", color = "grey80", 
             linewidth = 0.5) +
  geom_vline(xintercept = 0, linetype = "dotted", color = "grey80", 
             linewidth = 0.5) +
  geom_text(data = head(data_subset, 1), 
            aes(x = max(data_subset$change_in_income, na.rm = TRUE) * 0.05, 
                y = min(data_subset$change_in_having_per, na.rm = TRUE) * 1.1, 
                label = "MORE INCOME →"),
            hjust = 0,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = max(data_subset$change_in_income, na.rm = TRUE) * -0.05, 
                y = min(data_subset$change_in_having_per, na.rm = TRUE) * 1.1, 
                label = "← LESS INCOME"),
            hjust = 1,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = min(data_subset$change_in_income, na.rm = TRUE) * 1.1, 
                y = min(data_subset$change_in_having_per, na.rm = TRUE) * 0.16, 
                label = "LESS PLUMBING"),
            hjust = 0,
            vjust = 1,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = min(data_subset$change_in_income, na.rm = TRUE) * 1.1, 
                y = max(data_subset$change_in_having_per, na.rm = TRUE) * 0.1, 
                label = "MORE PLUMBING"),
            hjust = 0,
            vjust = 0,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = min(data_subset$change_in_income, na.rm = TRUE) * 1.09, 
                y = max(data_subset$change_in_having_per, na.rm = TRUE) * 0.18, 
                label = "↑"),
            hjust = 0,
            vjust = 0,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = min(data_subset$change_in_income, na.rm = TRUE) * 1.09, 
                y = min(data_subset$change_in_having_per, na.rm = TRUE) * 0.25, 
                label = "↓"),
            hjust = 0,
            vjust = 1,
            color = "#6E6E6E") +
  geom_text(data = lower_right,
            aes(
              x = change_in_income,
              y = change_in_having_per * 1.06,
              label = paste(name, paste0("pop. ", formatC(total_housing_2023, format = "d", big.mark=",")), sep = "\n")),
            size = label_size,
            vjust = 1) +
  geom_text(data = lower_left,
            aes(
              x = change_in_income,
              y = change_in_having_per * 1.18,
              label = paste(name, paste0("pop. ", formatC(total_housing_2023, format = "d", big.mark=",")), sep = "\n")),
            size = label_size,
            vjust = 1) +
  geom_text(data = upper_right,
            aes(
              x = change_in_income,
              y = change_in_having_per * 0.965,
              label = paste(name, paste0("pop. ", formatC(total_housing_2023, format = "d", big.mark=",")), sep = "\n")),
            size = label_size,
            vjust = 1) +
  theme(
    legend.position = "none"
  )

map <- ggplot() +
  geom_sf(data = state_sf, color = "white", fill = "grey90") +
  geom_sf(data = filter(state_sf, 
                        name %in% c("Arizona", "California", "Utah", "Nevada")), 
          color = "white", fill = "grey30") +
  theme_void()

ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot,
            x = margin,
            y = margin * 3,
            height = 1 - (8 * margin),
            width = 1 - (2 * margin)) +
  draw_plot(map,
            x = margin,
            y = margin * 6,
            width = 0.2) +
  # explainer text (Edit the author name only)
  draw_label("Hayley Corson-Dosch, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend) +
  # Title
  draw_label("Household income versus plumbing access\nfrom 2022 to 2023",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  # annotations
  draw_label("Change in median income and access to plumbing from 2022 to 2023 for counties in the Southwest U.S.",
             x = margin,   
             y = 1 - 0.12,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  draw_label("The size of each square is scaled to the population of each county.",
             x = margin,   
             y = 1 - 0.14,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)
             

Additional resources

View tidycensus developers’, Kyle Walker and Matt Herman, website to learn more about additional package functionality and documentation. Check out the book “Analyzing US Census Data: Methods, Maps, and Models in R ”, by Kyle Walker, for additional information on wrangling, modeling, and analyzing U.S. Census data. View our site to learn more about related research.

References

Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Share:

Related Posts

  • Mapping water insecurity in R with tidycensus

    December 9, 2024

    Banner that displays three choropleth maps displaying percent hispanic, median gross rent, and average household size using 2022 U.S. Census Bureau Data.

    Water insecurity can be influenced by number of social vulnerability indicators—from demographic characteristics to living conditions and socioeconomic status —that vary spatially across the U.S. This blog shows how the tidycensus package for R can be used to access U.S. Census Bureau data, including the American Community Surveys, as featured in the “Unequal Access to Water ” data visualization from the USGS Vizlab. It offers reproducible code examples demonstrating use of tidycensus for easy exploration and visualization of social vulnerability indicators in the Western U.S.

  • Reproducible Data Science in R: Flexible functions using tidy evaluation

    December 17, 2024

    A smiling raindrop in a hardhat works on a pipe assembly line beneath the blog series title, Reproducible Data Science with R and to the left of the blog post title, Flexible functions using tidy evaluation.

    Overview

    This blog post is part of a series that works up from functional programming foundations through the use of the targets R package to create efficient, reproducible data workflows.

  • Reproducible Data Science in R: Writing functions that work for you

    May 14, 2024

    A smiling raindrop in a hardhat works on a pipe assembly line beneath the blog series title, Reproducible Data Science with R and to the left of the blog post title, Writing functions that work for you.

    Overview

    This blog post is part of a series that works up from functional programming foundations through the use of the targets R package to create efficient, reproducible data workflows.

  • Origin and development of a Snowflake Map

    January 11, 2023

    The result

    It’s been a snowy winter, so let’s make a snow cover map! This blog outlines the process of how I made a snowflake hex map of the contiguous U.S. The final product shows whiter snowflakes where snow cover was higher, which was generally in the northern states and along the main mountain ranges. I used a few interesting tricks and packages (e.g., ggimage , magick ) to make this map and overlay the snowflake shape. Follow the steps below to see how I made it.

  • The Hydro Network-Linked Data Index

    November 2, 2020

    Introduction

    updated 11-2-2020 after updates described here .

    updated 9-20-2024 when the NLDI moved from labs.waterdata.usgs.gov to api.water.usgs.gov/nldi/

    The Hydro Network-Linked Data Index (NLDI) is a system that can index data to NHDPlus V2 catchments and offers a search service to discover indexed information. Data linked to the NLDI includes active NWIS stream gages , water quality portal sites , and outlets of HUC12 watersheds . The NLDI is a core product of the Internet of Water and is being developed as an open source project. .