Charting 'tidycensus' data with R
This blog shows several different ways to visualize data from the `tidycensus` package for R.
What's on this page
In January, 2025, the organizers of the tidytuesday challenge highlighted data that were featured in a previous blog post and data visualization website . Some of us in the USGS Vizlab wanted to participate by creating a series of data visualizations showing these data, specifically the metric “households lacking plumbing.” This blog highlights our data visualizations inspired by the tidytuesday challenge as well as the code we used to create them, based on our previous software release on GitHub .
Data downloading and preparation
We used functions to download the data from the tidycensus package
in R at the county level for 2022 and 2023. The functions are available from an open-access software release on GitHub
.
These are the U.S. Census Bureau variables we used in these data visualizations (see full citations below):
- B01003_001: total population per county
- B25049_004: households lacking plumbing
- B25049_002: total housing units (both owner- and renter-occupied)
- B25049_003: households with plumbing
- B19013_001: median household income
We also downloaded complete state and county files from the tigris package
for mapping purposes.
Code to download the data
# Load libraries
library(tidycensus)
library(sf) 
library(janitor) 
library(tidyverse)
library(rmapshaper)
library(tigris)
# Helper functions -----
#   From Software release: 
#     https://github.com/DOI-USGS/vulnerability-indicators/blob/main/2_process/src/data_utils.R
get_census_data <- function(geography, var_names, year, proj, survey_var) {
  df <- get_acs(
    geography = geography,
    variable = var_names,
    year = year,
    geometry = TRUE,
    survey = survey_var) |>
    clean_names() |>
    st_transform(proj) |>
    mutate(year = year)
  
  return(df) 
}
# Grab relevant variables 
vars <- c("B25049_002", "B25049_004", "B19013_001", "B25049_003", "B01003_001")
# Pull data for 2023 and 2022 for all US counties ------
plumbing_data_2023 <- get_census_data(
  geography = 'county', 
  var_names = vars, 
  year = "2023", 
  proj = "EPSG:5070", 
  survey_var = "acs1"
) |>
  mutate(
    variable_long = case_when(
      variable == "B25049_002" ~ "total_housing",
      variable == "B25049_004" ~ "plumbing",
      variable == "B19013_001" ~ "median_household_income",
      variable == "B25049_003" ~ "units_having_plumbing",
      variable == "B01003_001" ~ "total_pop",
      .default = NA_character_  
    )
  ) |> 
  select(geoid, name, variable_long, estimate, geometry, year) |> 
  pivot_wider(
    names_from = variable_long,
    values_from = estimate
  ) |> 
  mutate(
    percent_lacking_plumbing = (plumbing / total_housing) * 100,
    percent_having_plumbing = (units_having_plumbing / total_housing) * 100
  )
plumbing_data_2022 <- get_census_data(
  geography = 'county', 
  var_names = vars, 
  year = "2022", 
  proj = "EPSG:5070", 
  survey_var = "acs1"
) |>
  mutate(
    variable_long = case_when(
      variable == "B25049_002" ~ "total_housing",
      variable == "B25049_004" ~ "plumbing",
      variable == "B19013_001" ~ "median_household_income",
      variable == "B25049_003" ~ "units_having_plumbing",
      variable == "B01003_001" ~ "total_pop",
      .default = NA_character_  
    )
  ) |> 
  select(geoid, name, variable_long, estimate, geometry, year) |> 
  pivot_wider(
    names_from = variable_long,
    values_from = estimate
  ) |> 
  mutate(
    percent_lacking_plumbing = (plumbing / total_housing) * 100,
    percent_having_plumbing = (units_having_plumbing / total_housing) * 100
  )
# Pull data by the state level
state_plumbing_data_2023 <- get_census_data(
  geography = 'state', 
  var_names = vars, 
  year = "2023", 
  proj = "EPSG:5070", 
  survey_var = "acs1"
) |>
  mutate(
    variable_long = case_when(
      variable == "B25049_002" ~ "total_housing",
      variable == "B25049_004" ~ "plumbing",
      variable == "B19013_001" ~ "median_household_income",
      variable == "B25049_003" ~ "units_having_plumbing",
      variable == "B01003_001" ~ "total_pop",
      .default = NA_character_  
    )
  ) |> 
  select(geoid, name, variable_long, estimate, geometry, year) |> 
  pivot_wider(
    names_from = variable_long,
    values_from = estimate
  ) |> 
  mutate(
    percent_lacking_plumbing = (plumbing / total_housing) * 100,
    percent_having_plumbing = (units_having_plumbing / total_housing) * 100
  )
counties_sf <-   # download a generalized (1:500k) counties file   
  tigris::counties(cb = TRUE) |>   
  # set projection   
  sf::st_transform("EPSG:5070") |>   
  # standardize column names   
  janitor::clean_names() |>     
  # simplify spatial data   
  rmapshaper::ms_simplify(keep = 0.2) |> 
  filter(! stusps %in% c("AK", "HI", "PR", "VI", "MP", "GU", "AS"))
state_sf <- 
  # download a generalized (1:500k) states file
  tigris::states(cb = TRUE) |> 
  # set projection
  st_transform("EPSG:5070") |> 
  # standardize column names
  clean_names() |> 
  # simplify spatial data
  rmapshaper::ms_simplify(keep = 0.2) |> 
  filter(! stusps %in% c("AK", "HI", "PR", "VI", "MP", "GU", "AS"))
Setting up the data visualization framework
We set up a few consistent settings for all of the following data visualizations, such as standard typefaces, colors, and a canvas and margin for plotting with cowplot
. See our “Jazz up your ggplot” blog post
for more cowplot-related plotting tips and tricks.
Code to set up our data visualizations
library(showtext)
library(sysfonts)
library(grid)
library(magick)
library(cowplot)
# Load custom font
font_legend <- "Source Sans Pro"
sysfonts::font_add_google(font_legend)
showtext::showtext_opts(dpi = 300, regular.wt = 200, bold.wt = 700)
showtext::showtext_auto(enable = TRUE)
# Define colors
background_color = "white"
font_color = "black"
# The background canvas for your viz
canvas <- grid::rectGrob(
  x = 0, y = 0, 
  width = 9, height = 9,
  gp = grid::gpar(fill = background_color, alpha = 1, col = background_color)
)
# margin for plotting
margin = 0.04
# Load in USGS logo (also a white logo available)
usgs_logo <- magick::image_read("usgs_logo_black.png") 
Note: After the cowplot viz are built in the examples below, we then save the visualizations to png using ggsave, such as:
ggsave(filename = "xxx.png", width = 9, height = 9, dpi = 300, units = "in", bg = "white")
Line chart example
This line chart is a simple but beautiful way to compare the rates of plumbing access between 2022 and 2023 for California counties. The top 5 counties in 2023 are highlighted with color and labels, which are nice graphing tricks to improve readability. The colors were from the RColorBrewer
package and the labels were placed with the ggrepel
package.

Chart by Elmera Azadpour
Code to reproduce this figure
library(RColorBrewer)
library(ggrepel)
library(scales)
# Get CA data only
plumbing_data_2022_ca <- plumbing_data_2022 |> 
  filter(grepl(", California$", name)) 
plumbing_data_2023_ca <- plumbing_data_2023 |> 
  filter(grepl(", California$", name)) 
# Combine 2022 and 2023 data and drop geometry
combined_data <- bind_rows(plumbing_data_2022_ca, plumbing_data_2023_ca) |> 
  st_drop_geometry()
# Pivot to long format for slope chart
long_data_ca <- combined_data |> 
  select(name, year, percent_lacking_plumbing) |> 
  pivot_wider(names_from = year, values_from = percent_lacking_plumbing) |> 
  pivot_longer(cols = c(`2022`, `2023`), names_to = "year", values_to = "percent_lacking_plumbing") |> 
  mutate(name = str_remove(name, ",\\s.*"))
# Grab the top 5 counties in 2023
top_5_names <- long_data_ca |> 
  filter(year == 2023) |> 
  arrange(desc(percent_lacking_plumbing)) |> 
  slice(1:5) |> 
  pull(name)
# Add a variable to distinguish top 5 counties
long_data_ca <- long_data_ca |> 
  mutate(
    top_5_flag = ifelse(name %in% top_5_names, name, "Other")
  ) 
# Generate a color palette for the top 5 and others
top_5_colors <- RColorBrewer::brewer.pal(5, "Set1")  
neutral_color <- "gray80"              # gray color for other counties
palette <- c(setNames(top_5_colors, top_5_names), Other = neutral_color)
# Main plot
(main_plot <- ggplot() +
    # Add gray lines and points first for non-top 5 names
    geom_line(
      data = long_data_ca |> filter(!(name %in% top_5_names)),
      aes(x = year, y = percent_lacking_plumbing, group = name),
      linewidth = 1,
      color = "gray80"
    ) +
    geom_point(
      data = long_data_ca |> filter(!(name %in% top_5_names)),
      aes(x = year, y = percent_lacking_plumbing),
      size = 4,
      color = "gray80"
    ) +
    # Add lines and points for top 5 names
    geom_line(
      data = long_data_ca |> filter(name %in% top_5_names),
      aes(x = year, y = percent_lacking_plumbing, group = name, color = top_5_flag),
      linewidth = 1
    ) +
    geom_point(
      data = long_data_ca |> filter(name %in% top_5_names),
      aes(x = year, y = percent_lacking_plumbing, color = top_5_flag),
      size = 4
    ) +
    # Add labels for top 5 names
    ggrepel::geom_text_repel(
      data = long_data_ca |> filter(year == 2023 & name %in% top_5_names),
      aes(x = year, y = percent_lacking_plumbing, label = name, color = top_5_flag),
      size = 5,
      nudge_x = 0.35,
      show.legend = FALSE
    ) +
    scale_color_manual(values = palette) +
    scale_y_continuous(labels = scales::percent_format(scale = 1)) +
    scale_x_discrete(expand = c(0.08, 0)) +
    labs(
      x = NULL,
      y = NULL
    ) +
    theme_minimal() +
    theme(
      legend.position = "none", 
      axis.text.x = element_text(size = 16, family = font_legend), 
      axis.text.y = element_text(size = 16, family = font_legend), 
      plot.margin = margin(t = 20, r = 20, b = 20, l = 20, unit = "pt")
    )
)
ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot,
            x = margin - 0.03,
            y = margin + 0.065,
            height = 1 - (4*margin),
            width = 1 - (margin/2)) +
  # explainer text (Edit the author name only)
  draw_label("Elmera Azadpour, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend)+
  # Title
  draw_label("Percent Lacking Plumbing in California",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  # Add logo
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)
             
Bubble map example
This bubble map visualizes the change in the percent of households lacking plumbing facilities by county from 2022 to 2023. The dots represent both the direction (color) and magnitude (size) of change in the percent of households lacking plumbing from 2022 to 2023. Labels paired with curved arrows highlight two example counties and provide guidance for how to read the chart.

Chart by Elmera Azadpour
Code to reproduce this figure
# Get lower 48 data for 2023 
plumbing_data_2023_df <- plumbing_data_2023 |> 
  st_drop_geometry() |> 
  filter(!str_detect(name, "Hawaii|Alaska|Puerto Rico"))
# Get lower 48 data for 2022
plumbing_data_2022_df <-  plumbing_data_2022 |>
  st_drop_geometry() |> 
  filter(!str_detect(name, "Hawaii|Alaska|Puerto Rico"))
# Join 2023 and 2022 data
merged_data <- full_join(plumbing_data_2023_df, plumbing_data_2022_df, 
                         by = "geoid", 
                         suffix = c("_2023", "_2022"))
merged_data <- merged_data |> 
  # Calculate whether the difference between 
  # percent_lacking_plumbing_2023 and percent_lacking_plumbing_2022 
  # is positive or negative
  mutate(change_direction = ifelse(percent_lacking_plumbing_2023 - 
    percent_lacking_plumbing_2022 >= 0, "Increase", "Decrease"),
    # Calculate the absolute value of the difference between 
    # percent_lacking_plumbing_2023 and percent_lacking_plumbing_2022
    change_magnitude = abs(percent_lacking_plumbing_2023 -
          percent_lacking_plumbing_2022))
merged_data_with_geometry <- plumbing_data_2023 |> 
  # Retain only the geoid and geometry columns
  select(geoid, geometry) |>  
  # Then join back with merge_data
  right_join(merged_data, by = "geoid") 
# Calculate centroids for each geometry
merged_data_with_geometry <- merged_data_with_geometry |> 
  mutate(centroid = st_centroid(geometry))
# Separate centroids for plotting bubbles
centroids <- merged_data_with_geometry |> 
  select(geoid, name_2023,change_direction,change_magnitude, centroid) |> 
  st_as_sf(sf_column_name = "centroid") |> 
  drop_na()
# Edit centroid breaks and labels for legend
centroids <- centroids |> 
  mutate(
    magnitude_category = cut(
      change_magnitude,
      breaks = c(0, 0.1, 0.3, 0.5, Inf), # Four bins
      labels = c("≤0.1", "0.1–0.3", "0.3–0.5", ">0.5"), # Corresponding labels
      include.lowest = TRUE
    )
  )
# Main plot
(main_plot <- ggplot() +
    geom_sf(
      color = "white", 
      linewidth = 0.05 
    ) +  
    geom_sf(
      data = state_sf,
      fill = NA, 
      color = "black",  
      linewidth = 0.2, 
      linetype = "solid" 
    ) +  
    geom_sf(
      data = counties_sf, 
      fill = NA, 
      color = "grey70", 
      linewidth = 0.1, 
      linetype = "solid" 
    ) +
    geom_sf(
      data = centroids,
      aes(
        geometry = centroid,
        size = magnitude_category, 
        color = change_direction
      ),
      alpha = 0.6
    ) +
    scale_color_manual(
      values = c("Increase" = "#B24425", "Decrease" = "#176966"),
      name = "Direction"
    )  +
    scale_size_manual(
      name = "Magnitude",
      values = c(2, 4, 6, 8), # adjust bubble sizes to match categories
      labels = c("≤0.1", "0.1–0.3", "0.3–0.5", ">0.5") 
    ) +
    theme_void() 
)
# Custom arrows for callouts
plot_tx <- ggplot() +
  theme_void() +
  geom_curve(data =  data.frame(x = 13, y = 5),
             aes(x = 13, y = 5,
                 xend = 12, yend = 8),
             arrow = grid::arrow(length = unit(0.5, 'lines')), 
             curvature = -0.2,  
             angle = 90, 
             ncp = 10,   
             color = 'black')
plot_nm <- ggplot() +
  theme_void() +
  geom_curve(data = data.frame(x = 13, y = 5),
             aes(x = 13, y = 5,
                 xend = 14, yend = 8), 
             arrow = grid::arrow(length = unit(0.5, 'lines')), 
             curvature = 0.2, 
             angle = 90, 
             ncp = 10, 
             color = 'black')
# Custom colored horizontal lines to highlight increase/decrease in labels
increase_line <- ggplot() +
  theme_void() +
  geom_segment(aes(x = 0, y = 0, xend = 2, yend = 0), 
               color = "#B24425", 
               size = 1,
               alpha = 0.6) 
decrease_line <- ggplot() +
  theme_void() +
  geom_segment(aes(x = 0, y = 0, xend = 2, yend = 0), 
               color = "#176966", 
               size = 1,
               alpha = 0.6) 
# Customize legend a bit more
legends <- cowplot::get_plot_component(
  main_plot +
    theme(legend.direction = "horizontal") +
    guides(color = guide_legend(override.aes = list(size = 8),
                                order = 1),
           size = guide_legend(override.aes = list(fill = "gray60",
                               color = "black",
                               alpha = 0.5)),
           order = 2) +
    theme(legend.title = element_text(family = font_legend, face = "bold", size = 16),
          legend.text = element_text(family = font_legend, size = 14)),
  "guide-box", return_all = TRUE)
ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot +  theme(legend.position = "none"),
            x = margin - 0.06,
            y = margin + 0.02,
            height = 1 - (5*margin),
            width = 1.05) +
  # Add legends
  draw_plot(legends[[1]], .442, .72, .115, .15)  +
  # explainer text (Edit the author name only)
  draw_label("Elmera Azadpour, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",           x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend)+
  # Title
  draw_label("Change in Percent of Households Lacking\nPlumbing Facilities by County (2023–2022)",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)+
  # Add arrow for Orange County, Texas (increase) 
  draw_plot(plot_tx,
            x = margin + 0.52,
            y = margin + 0.14,
            height = margin + 0.03,
            width = margin + 0.02) +
  draw_label("Orange County, TX\nexperienced a 2.3%\n increase in households\n lacking plumbing facilities",
             fontfamily = font_legend,
             x =  margin + 0.66,   
             y = margin + 0.11,
             size = 12,
             color = "black",
             lineheight = 1.1) + 
  # "Underline" increase text
  draw_plot(increase_line,
            x = margin + 0.57,
            y = margin + 0.051,
            height = margin + 0.03,
            width = margin + 0.032) +
  # Add arrow for McKinley County, New Mexico (decrease)
  draw_plot(plot_nm,
            x = margin + 0.18,
            y = margin + 0.235,
            height = margin + 0.08,
            width = margin + 0.03) +
  draw_label("McKinley County, NM\nexperienced a 0.93%\n decrease in households\n lacking plumbing facilities",
             fontfamily = font_legend,
             x =  margin + 0.09,   
             y = margin + 0.2,
             size = 12,
             color = "black",
             lineheight = 1.1) + 
  # "Underline" decrease text
  draw_plot(decrease_line,
            x = margin - 0.001,
            y = margin + 0.141,
            height = margin + 0.03,
            width = margin + 0.034)
Cartogram example
Here, a cartogram distorts the scale of the states so that those with a large number of households lacking plumbing are larger than they otherwise would based on area alone, creating a warped look to familiar features. The color ramp also represents the households lacking plumbing in 2023, with darker colors in states with higher levels. There is an inset map with geographically-accurate states to provide a comparison against the warped shapes of the cartogram. The package cartogram
was used to create this map.

Code to reproduce this figure
library(cartogram)
# select only the conterminuous (lower 48) states and add USPS code
conus <- state_plumbing_data_2023 |> 
  mutate(stusps = state.abb[match(name, state.name)]) |>
  filter(! stusps %in% c("AK", "AS", "HI", "PR", "VI", "MP", "GU"),
         ! name %in% "Puerto Rico")
# Use the cartogram package to create a continuous cartogram, 
#   weighted by households lacking plumbing
carto_state <- cartogram::cartogram_cont(conus,
                                         weight = "plumbing")
# Plot the cartogram, with legend
(cartogram_plot <- ggplot(carto_state) + 
    geom_sf(aes(fill = plumbing)) +
    scico::scale_fill_scico(palette = "oslo", direction = -1) +
    labs(fill = "Households lacking plumbing") +
    theme_void() +
    theme(legend.position = "bottom",
          legend.key.width=unit(2, "cm")))
# Plot an inset map that is not distorted
(conus_plot <- ggplot(conus) + 
    geom_sf(aes(fill = plumbing)) +
    scico::scale_fill_scico(palette = "oslo", direction = -1) +
    theme_void() +
    theme(legend.position = "none"))
ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(cartogram_plot,
            x = 0.025, 
            y = 0.15,
            width = 0.95,
            height = 0.7) +  
  # the inset plot 
  draw_plot(conus_plot,
            x = 0.06, 
            y = 0.18,
            width = 0.2,
            height = 0.25) +  
  # explainer text (Edit the author name only)
  draw_label("Althea Archer, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend) +
  # Title
  draw_label("The lack of access to plumbing in 2023",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  draw_label("Over 200,000 households lack access to plumbing in the lower 48 United States.",
             x = margin,   
             y = 1 - 0.1,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)
Geofaceted area plot example
Geofacets can be used to create a series of mini charts that are aligned to be consistent with geographic features. This geofacet has a mini-area plot for each state that represents the number of households with plumbing in 2022 compared to 2023. The blue areas are households from all the counties that had a decrease in the number of households from 2022 to 2023, and the red areas are the households from counties with an increase. The package we used to arrange mini charts like this was the geofacet
package.

Code to reproduce this figure
library(geofacet)
# Join 2022 and 2023 data together
joined <- plumbing_data_2022 |>
  sf::st_drop_geometry() |>
  select(-year) |>
  rename_at(vars(-name, -geoid), function(x) paste0(x, "_22")) |>
  inner_join(plumbing_data_2023 |> 
              sf::st_drop_geometry() |> 
              select(-name, -year) |>
              rename_at(vars(-geoid), function(x) paste0(x, "_23")),
            by = "geoid") |>
  # calculate trends from 2022 to 2023
  mutate(n_change = plumbing_23 - plumbing_22,
         pct_change = percent_lacking_plumbing_23 - percent_lacking_plumbing_22,
         n_change_cat = case_when(n_change > 0 ~ "Increasing",
                                  n_change == 0 ~ "No Change",
                                  n_change < 0 ~ "Decreasing")) |>
  mutate(state_id = substr(geoid, 1, 2))
# get grid for geofacet from the Flow Tiles open software repository
#  https://github.com/DOI-USGS/flow-tiles/blob/3f871a7ca393bd26f2a8e15a3fbd5621eb02e8f1/src/plot_cartogram.R#L2
usa_grid <- readr::read_csv("usa_grid.csv")|> 
  filter(! code %in% c("PR", "HI", "AK"))
# Join data with the full counties dataset
joined_all <- joined |>
  left_join(counties_sf |> sf::st_drop_geometry() |> select(stusps, state_name, geoid), 
            by = "geoid") 
# back to long for plotting
long_join <- joined_all |>
  select(-name, -state_id) |>
  select(n_change_cat, stusps, state_name, geoid, plumbing_22, plumbing_23) |> 
  pivot_longer(cols = c(plumbing_22, plumbing_23),
               names_to = "year",
               values_to = "households")
# Summarise by state for plotting
state_summary <- long_join |> 
  filter(! is.na(stusps), stusps != "DC") |>
  group_by(stusps, n_change_cat, year) |>
  summarise(households = sum(households, na.rm = TRUE)) |>
  mutate(year_num = case_when(year == "plumbing_22" ~ 2022,
                              year == "plumbing_23" ~ 2023)) |>
  mutate(cat_factor = case_when(n_change_cat == "Increasing" ~ "C",
                                n_change_cat == "Decreasing" ~ "A",
                                TRUE ~ "B"))
(main_plot <- ggplot(data = state_summary,
                     aes(y = households, x = year_num, fill = cat_factor)) +
    geom_area() + 
    scale_fill_manual(values = c("#538CC6", "#538CC6", "#872341")) +
    scale_x_continuous(breaks = c(2022, 2023), labels = c("2022", "2023"),
                       name = "Comparison from 2022 (left) to 2023 (right)") +
    scale_y_continuous(breaks = c(0, 10000, 20000), labels = c("0k", "10k", "20k"),
                       name = "Houses without plumbing") + 
    geofacet::facet_geo( ~ stusps, grid = usa_grid, move_axes = TRUE) +
    theme_minimal() +
    theme(legend.position = "none",
          panel.background = element_blank(),
          panel.grid = element_blank(),
          axis.text.y = element_text(size = 7),
          axis.text.x = element_blank(),
          axis.title = element_text(size = 10)))
ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot,
            x = 0.025, 
            y = 0.15,
            width = 0.95,
            height = 0.7) +  
  # explainer text (Edit the author name only)
  draw_label("Althea Archer, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend) +
  # Title
  draw_label("Change in households with plumbing 2022-2023",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  draw_label("Counties in each state are colored based on if they had an",
             x = margin,   
             y = 1 - 0.1,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  draw_label("increase or decrease in the number of households lacking plumbing.",
             x = margin,   
             y = 1 - 0.12,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  # "Underline" text
  draw_line(x = c(0.04, 0.10),
            y = c(1 - 0.14, 1 - 0.14),
            color = "#872341",
            linewidth = 1.4) +
  draw_line(x = c(0.14, 0.20),
            y = c(1 - 0.14, 1 - 0.14),
            color = "#538CC6",
            linewidth = 1.4) +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)
Rainfall plot example
This plot uses a paired half-rainfall and half-density plot to show the variation in households with plumbing by region. The plot of variation is paired with an inset map to show where those regions are located. Variation can be easily and beautifully plotted with the use of the ggdist
package, like we did here.

Code to reproduce this figure
library(ggdist)
# Join county data with water data 
county_df <- counties_sf |>
  sf::st_drop_geometry() |>
  left_join(plumbing_data_2023 |> select(geoid, year, total_housing, plumbing, 
                                       percent_lacking_plumbing),
            by = "geoid") |>
  # add regions
  mutate("CASC" = case_when(state_name %in% c("Minnesota", "Iowa", "Missouri", "Wisconsin", "Illinois", "Indiana", "Michigan", "Ohio") ~ "Midwest", 
                            state_name %in% c("Montana", "Wyoming", "Colorado", "North Dakota", "South Dakota", "Nebraska", "Kansas") ~ "North Central",                           
                            state_name %in% c("Maine", "New Hampshire", "Vermont", "Massachusetts", "Connecticut", "Rhode Island",  "New York", "New Jersey", "Pennsylvania", "Delaware", "Maryland", "West Virginia","Virginia", "Kentucky", "District of Columbia") ~ "Northeast", 
                            state_name %in% c("Washington", "Oregon", "Idaho") ~ "Northwest",                           
                            state_name %in% c("New Mexico", "Texas", "Oklahoma", "Louisiana") ~ "South Central", 
                            state_name %in% c("North Carolina", "South Carolina", "Georgia", "Alabama", "Mississippi", "Florida", "Tennessee",                                     "Arkansas") ~ "Southeast",  
                            state_name %in% c("Arizona", "California", "Utah", "Nevada") ~ "Southwest", TRUE ~ "not sorted")) |>
  # remove missing data
  filter(! is.na(year))
# join regions to state data
state_region_sf <- state_sf |> 
  mutate("CASC" = case_when(name %in% c("Minnesota", "Iowa", "Missouri", "Wisconsin", "Illinois", "Indiana", "Michigan", "Ohio") ~ "Midwest", 
                            name %in% c("Montana", "Wyoming", "Colorado", "North Dakota", "South Dakota", "Nebraska", "Kansas") ~ "North Central",                           
                            name %in% c("Maine", "New Hampshire", "Vermont", "Massachusetts", "Connecticut", "Rhode Island",  "New York", "New Jersey", "Pennsylvania", "Delaware", "Maryland", "West Virginia","Virginia", "Kentucky", "District of Columbia") ~ "Northeast", 
                            name %in% c("Washington", "Oregon", "Idaho") ~ "Northwest",                           
                            name %in% c("New Mexico", "Texas", "Oklahoma", "Louisiana") ~ "South Central", 
                            name %in% c("North Carolina", "South Carolina", "Georgia", "Alabama", "Mississippi", "Florida", "Tennessee",                                     "Arkansas") ~ "Southeast",  
                            name %in% c("Arizona", "California", "Utah", "Nevada") ~ "Southwest", TRUE ~ "not sorted"))
# State map for inset
(map <- ggplot(state_region_sf) +   
    geom_sf(aes(fill = CASC), color = "white", alpha = 0.85) +   
    scale_fill_manual(values = pal_hue()(7))+
    theme_void()+
    theme(legend.position = 'none')
)
(main_plot <- ggplot(county_df,
                     aes(y = CASC, 
                         x = percent_lacking_plumbing,
                         show.legend = FALSE)) +
    ggdist::stat_dotsinterval(fill = 'black',
                              side = "bottom", quantiles = 30)+
    ggdist::stat_halfeye(mapping = aes(fill = CASC))+
    ylab('Climate Adaptation Science Center (CASC) Region')+
    xlab('Percent of households per county lacking plumbing')+
    theme_void()+
    theme(legend.position = 'none', 
          axis.title.x = element_text(size = 14),
          axis.text = element_text(size = 14, hjust = 1)) 
)
ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot,
            x = margin,
            y = 0.170,
            height = 0.775,
            width = 1) +
  ## inset map
  draw_plot(map,
            x = 0.15,
            y = 0.25,
            height = 0.28) +
  # explainer text (Edit the author name only)
  draw_label("Sharmin Siddiqui, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend) +
  # Title
  draw_label("Regional differences in access to plumbing, 2023",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)
Grid chart example
This chart uses an overall grid-layout to compare income versus plumbing access. The chart is divided into counties that saw an increase or decrease in median household income (left versus right) and into counties that saw an increase or decrease in the percent of households with plumbing (top versus bottom). Each county is then shown with a square, scaled to the total population in that county in 2023. The map author was inspired by this Datawrapper visualization.

Code to reproduce this figure
plumbing_df <- plumbing_data_2022 |>
  inner_join(sf::st_drop_geometry(plumbing_data_2023), 
             join_by("geoid", "name"), 
             suffix = c("_2022", "_2023")) |>
  mutate(
    change_in_having_per = 
      percent_having_plumbing_2023 - percent_having_plumbing_2022,
    change_in_income = median_household_income_2023 - median_household_income_2022,
    category = case_when(
      (change_in_income > 0 & change_in_having_per >= 0) ~ "a",
      (change_in_income > 0 & change_in_having_per < 0) ~ "b",
      (change_in_income < 0 & change_in_having_per >= 0) ~ "c",
      TRUE ~ "d"
    )) |>
  filter(!is.na(change_in_having_per))
# get geoids for SW U.S.
focal_geoids <- counties_sf |>
  filter(state_name %in% c("Arizona", "California", "Utah", "Nevada")) |>
  pull(geoid)
# subset data to SW U.S.
data_subset <- plumbing_df |>
  filter(geoid %in% focal_geoids) |>
  arrange(desc(total_pop_2023))
# get data subsets for annotations
# 49011 Davis County, Utah - lower right
# 04003 Cochise County, Arizona - lower left
# 04005 Coconino County, Arizona - upper right
lower_right <- filter(data_subset, geoid == "49011")
lower_left <- filter(data_subset, geoid == "04003")
upper_right <- filter(data_subset, geoid == "04005")
label_size <- 4
main_plot <- ggplot(data_subset) +
  geom_point(
    aes(x = change_in_income, 
        y = change_in_having_per,
        fill = category,
        alpha = 0.2,
        size = total_pop_2023),
    color = "white",
    pch = 22) +
  geom_point(
    data = lower_right,
    aes(x = change_in_income, 
        y = change_in_having_per,
        fill = category,
        size = total_pop_2023),
    color = "black",
    pch = 22
  ) +
  geom_point(
    data = lower_left,
    aes(x = change_in_income, 
        y = change_in_having_per,
        fill = category,
        size = total_pop_2023),
    color = "black",
    pch = 22
  ) +
  geom_point(
    data = upper_right,
    aes(x = change_in_income, 
        y = change_in_having_per,
        fill = category,
        size = total_pop_2023),
    color = "black",
    pch = 22
  ) +
  scale_fill_manual(values = c("steelblue2", "tomato3", "mediumaquamarine", "orange"),
                    aesthetics = c("fill", "color")) +
  scale_x_continuous(expand = c(0.02, 0.02)) +
  scale_y_continuous(expand = c(0.01, 0.01)) +
  scale_size_continuous(range = c(1.5, 15)) +
  theme_void() +
  geom_hline(yintercept = 0, linetype = "dotted", color = "grey80", 
             linewidth = 0.5) +
  geom_vline(xintercept = 0, linetype = "dotted", color = "grey80", 
             linewidth = 0.5) +
  geom_text(data = head(data_subset, 1), 
            aes(x = max(data_subset$change_in_income, na.rm = TRUE) * 0.05, 
                y = min(data_subset$change_in_having_per, na.rm = TRUE) * 1.1, 
                label = "MORE INCOME →"),
            hjust = 0,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = max(data_subset$change_in_income, na.rm = TRUE) * -0.05, 
                y = min(data_subset$change_in_having_per, na.rm = TRUE) * 1.1, 
                label = "← LESS INCOME"),
            hjust = 1,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = min(data_subset$change_in_income, na.rm = TRUE) * 1.1, 
                y = min(data_subset$change_in_having_per, na.rm = TRUE) * 0.16, 
                label = "LESS PLUMBING"),
            hjust = 0,
            vjust = 1,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = min(data_subset$change_in_income, na.rm = TRUE) * 1.1, 
                y = max(data_subset$change_in_having_per, na.rm = TRUE) * 0.1, 
                label = "MORE PLUMBING"),
            hjust = 0,
            vjust = 0,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = min(data_subset$change_in_income, na.rm = TRUE) * 1.09, 
                y = max(data_subset$change_in_having_per, na.rm = TRUE) * 0.18, 
                label = "↑"),
            hjust = 0,
            vjust = 0,
            color = "#6E6E6E") +
  geom_text(data = head(data_subset, 1), 
            aes(x = min(data_subset$change_in_income, na.rm = TRUE) * 1.09, 
                y = min(data_subset$change_in_having_per, na.rm = TRUE) * 0.25, 
                label = "↓"),
            hjust = 0,
            vjust = 1,
            color = "#6E6E6E") +
  geom_text(data = lower_right,
            aes(
              x = change_in_income,
              y = change_in_having_per * 1.06,
              label = paste(name, paste0("pop. ", formatC(total_housing_2023, format = "d", big.mark=",")), sep = "\n")),
            size = label_size,
            vjust = 1) +
  geom_text(data = lower_left,
            aes(
              x = change_in_income,
              y = change_in_having_per * 1.18,
              label = paste(name, paste0("pop. ", formatC(total_housing_2023, format = "d", big.mark=",")), sep = "\n")),
            size = label_size,
            vjust = 1) +
  geom_text(data = upper_right,
            aes(
              x = change_in_income,
              y = change_in_having_per * 0.965,
              label = paste(name, paste0("pop. ", formatC(total_housing_2023, format = "d", big.mark=",")), sep = "\n")),
            size = label_size,
            vjust = 1) +
  theme(
    legend.position = "none"
  )
map <- ggplot() +
  geom_sf(data = state_sf, color = "white", fill = "grey90") +
  geom_sf(data = filter(state_sf, 
                        name %in% c("Arizona", "California", "Utah", "Nevada")), 
          color = "white", fill = "grey30") +
  theme_void()
ggdraw(ylim = c(0,1), # 0-1 scale makes it easy to place viz items on canvas
       xlim = c(0,1)) +
  # a background  
  draw_grob(canvas,
            x = 0, y = 1,
            height = 9, width = 9,
            hjust = 0, vjust = 1) +
  # the main plot 
  draw_plot(main_plot,
            x = margin,
            y = margin * 3,
            height = 1 - (8 * margin),
            width = 1 - (2 * margin)) +
  draw_plot(map,
            x = margin,
            y = margin * 6,
            width = 0.2) +
  # explainer text (Edit the author name only)
  draw_label("Hayley Corson-Dosch, USGS\n Data available from tidycensus https://walker-data.com/tidycensus/",
             x = 1 - margin,   
             y = margin,
             size = 12,
             hjust = 1,
             vjust = 0,
             color = font_color,
             fontfamily = font_legend) +
  # Title
  draw_label("Household income versus plumbing access\nfrom 2022 to 2023",
             x = margin,
             y = 1-margin,
             hjust = 0,
             vjust = 1,
             lineheight = 0.75,
             color = font_color,
             size = 28,
             fontfamily = font_legend,
             fontface = "bold") +
  # annotations
  draw_label("Change in median income and access to plumbing from 2022 to 2023 for counties in the Southwest U.S.",
             x = margin,   
             y = 1 - 0.12,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  draw_label("The size of each square is scaled to the population of each county.",
             x = margin,   
             y = 1 - 0.14,
             size = 12,
             hjust = 0,
             vjust = 1,
             fontfamily = font_legend,
             color = font_color) +
  # Add logo 
  draw_image(usgs_logo, 
             x = margin,
             y = margin,
             width = 0.15, 
             hjust = 0, vjust = 0, 
             halign = 0, valign = 0)
             
Additional resources
View tidycensus developers’, Kyle Walker and Matt Herman, website
to learn more about additional package functionality and documentation. Check out the book “Analyzing US Census Data: Methods, Maps, and Models in R
”, by Kyle Walker, for additional information on wrangling, modeling, and analyzing U.S. Census data. View our site
to learn more about related research.
References
- U.S. Census Bureau, 2022, “Tenure by Plumbing Facilities,” American Community Survey, 1-Year Estimates Detailed Tables, Table B25049, accessed on Jan 26, 2025, https://data.census.gov/table?q=B25049&y=2022 . 
- U.S. Census Bureau, 2022, “Total Population,” American Community Survey, 1-Year Estimates Detailed Tables, Table B01003, accessed on Jan 26, 2025, https://data.census.gov/table?q=B01003&y=2022 . 
- U.S. Census Bureau, 2022, “Median household income,” American Community Survey, 1-Year Estimates Detailed Tables, Table B19013, accessed on Jan 26, 2025, https://data.census.gov/table?q=B19013&y=2022 . 
- U.S. Census Bureau, 2023, “Tenure by Plumbing Facilities,” American Community Survey, 1-Year Estimates Detailed Tables, Table B25049, accessed on Jan 26, 2025, https://data.census.gov/table?q=B25049&y=2023 . 
- U.S. Census Bureau, 2023, “Total Population,” American Community Survey, 1-Year Estimates Detailed Tables, Table B01003, accessed on Jan 26, 2025, https://data.census.gov/table?q=B01003&y=2023 . 
- U.S. Census Bureau, 2023, “Median household income,” American Community Survey, 1-Year Estimates Detailed Tables, Table B19013, accessed on Jan 26, 2025, https://data.census.gov/table?q=B19013&y=2023 . 
- Walker K and Herman M. 2024. tidycensus: Load US Census Boundary and Attribute Data as “tidyverse” and “sf”-Ready Data Frames. R package version 1.6.5, https://walker-data.com/tidycensus/ . 
Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Related Posts
- Mapping water insecurity in R with tidycensus- December 9, 2024  - Water insecurity can be influenced by number of social vulnerability indicators—from demographic characteristics to living conditions and socioeconomic status —that vary spatially across the U.S. This blog shows how the - tidycensuspackage for R can be used to access U.S. Census Bureau data, including the American Community Surveys, as featured in the “Unequal Access to Water ” data visualization from the USGS Vizlab. It offers reproducible code examples demonstrating use of- tidycensusfor easy exploration and visualization of social vulnerability indicators in the Western U.S.
- Reproducible Data Science in R: Flexible functions using tidy evaluation- December 17, 2024  - Overview- This blog post is part of a series that works up from functional programming foundations through the use of the targets R package to create efficient, reproducible data workflows. 
- Reproducible Data Science in R: Writing functions that work for you- May 14, 2024  - Overview- This blog post is part of a series that works up from functional programming foundations through the use of the targets R package to create efficient, reproducible data workflows. 
- Origin and development of a Snowflake Map- January 11, 2023 - The result- It’s been a snowy winter, so let’s make a snow cover map! This blog outlines the process of how I made a snowflake hex map of the contiguous U.S. The final product shows whiter snowflakes where snow cover was higher, which was generally in the northern states and along the main mountain ranges. I used a few interesting tricks and packages (e.g., ggimage , magick ) to make this map and overlay the snowflake shape. Follow the steps below to see how I made it. 
- The Hydro Network-Linked Data Index- November 2, 2020 - Introduction- updated 11-2-2020 after updates described here . - updated 9-20-2024 when the NLDI moved from labs.waterdata.usgs.gov to api.water.usgs.gov/nldi/ - The Hydro Network-Linked Data Index (NLDI) is a system that can index data to NHDPlus V2 catchments and offers a search service to discover indexed information. Data linked to the NLDI includes active NWIS stream gages , water quality portal sites , and outlets of HUC12 watersheds . The NLDI is a core product of the Internet of Water and is being developed as an open source project. . 
