Overview

I recently investigated a subset of Social Vulnerability Index (SVI) data from CDC/ATSDR for Wake County, NC. I’m working on building my mapping skills in R, so this project felt like a great opportunity to play around with the tigris and sf packages.

Load Libraries

Load necessary packages to use functions throughout the document.

Code

library(tidyverse) # data wrangling
library(janitor) # clean up
library(labelled) # variable labels
library(scales) # scaling functions for visualization
library(openxlsx) # xslx reading
library(expss) # labels
library(sf) # mapping
library(tidycensus) # for using Census data
library(tigris) # download TIGER/Line shapefiles from Census
library(ggmap) # spatial viz
library(ggthemes) # ggplot themes
library(GGally) # ggplot2 extension
library(grateful) # package citations
library(knitr)

Import Data

After importing the file, I filtered the data to Wake County, NC for the purposes of this analysis. Using the tigris package, I created a shapefile tibble with Wake County’s census tracts. I also incorporated supplemental data by importing a file with the health service zones in Wake County, which allow me to further spatially aggregate the data.

Code

## SVI file
wake_tbl <- read.xlsx('Data/svi-data.xlsx') %>% 
  as_tibble() %>% # Convert to tibble
  clean_names() %>% # Clean variable names
  filter(county == "Wake County") # Filter to Wake County only

## Wake census tracts
wake_tracts <- tracts("NC", "Wake")

## Health service zones
wake_service_zones <- read.xlsx('Data/health_service_zones.xlsx') %>% 
  as_tibble() %>% # Convert to tibble
  clean_names() %>% # Clean variable names
  mutate(across(geoid:tract, ~as.character(.x))) # Convert variables to character

Clean Data

Steps to clean the data included: (1) converting the indicators provided to numeric, (2) creating overall variables for Wake County’s total population and housing units, (3) cleaning the location variable to only include the census tract, (4) creating a population density variable, (5) converting indicators that are counts to percents for more consistent comparison across tracts, (6) calculating a count of overcrowded housing units since it was the only indicator provided as a percent, and (7) multiplying percentages by 100 and round to 1 decimal for easier interpretation in figures to be developed.

Code

wake_tbl_clean <- wake_tbl %>%
  mutate(
    # Convert variables to numeric
    across(area_in_sq_mi:people_25_w_o_high_school_diploma, ~as.numeric(.x)),
    
    # Create variables for total county population and total housing units
    total_pop = sum(population),
    total_housing_units = sum(housing_units),
    
    # Clean location variable
    location = str_extract(location, "^[^;]+"),
    
    # Create variable for population density
    pop_density = population/area_in_sq_mi,
    
    # Update indicators to percents for comparison
    ## Denominator = Population
    perc_below_150_percent_poverty = ppl_below_150_percent_poverty/population,
    perc_bipoc_residents = bipoc_residents/population,
    perc_w_o_high_school_diploma = people_25_w_o_high_school_diploma/population,
    ## Denominator = Households
    perc_hh_no_vehicle = households_with_no_vehicle/households,
    
    # Calculate count of overcrowded housing units
    overcrowded_housing_units = percent_of_overcrowded_housing_units*housing_units,
    
    # Multiply percentages by 100 and round to 1 decimal
    across(starts_with("perc_"), ~round((.x*100), digits = 1))
  )

Merge Clean Data and Shapefile

I merged the cleaned dataset and the wake_tracts shapefile, joining by the “GEOID” field in the shapefile and the “fips_code” field in the cleaned data. Then, I merged the resulting file with the wake_service_zones shapefile to incorporate service zone data.

Code

wake_sf <- left_join(
  wake_tracts, 
  wake_tbl_clean, 
  by = c("GEOID" = "fips_code")) %>%
  drop_na(starts_with("perc"))

wake_service_zones_sf <- left_join(
  wake_sf,
  wake_service_zones, 
  by = c("GEOID" = "geoid")) %>%
  drop_na(starts_with("perc")) %>%
  mutate(zone_name = ifelse(zone_name=="East Central", "East\nCentral", zone_name))

Analyze Social Vulnerability Among Tracts

I wanted to identify the “most” and “least” socially vulnerable communities to visualize geographic disparities in social vulnerability factors across Wake County, so I calculated the 75th and 25th percentiles for each of the following indicators: rates of no high school diploma (age 25+), individuals below 150% FPL, overcrowded housing, households without a vehicle, and BIPOC residents.

I defined the most socially vulnerable areas as tracts in the top 25% across all 5 of the indicators: rates of no high school diploma among individuals 25 and over (>= 5.6%), individuals below 150% FPL (>= 18.6%), overcrowded housing units (>= 3%), households without a vehicle (>= 6%), and BIPOC residents (>= 52.6%). Similarly, I defined the least socially vulnerable areas as tracts in the bottom 25% across all 5 indicators: rates of no high school diploma among individuals 25 and over (<= 1.2%), individuals below 150% FPL (>= 6.5%), overcrowded housing units (0%), households without a vehicle (<= 0.9%), and BIPOC residents (<= 23.9%).

Code

# Create map caption
caption <- "Source: CDC/ATSDR Social Vulnerability Index"

# Define the columns of interest
columns_of_interest <- c(
  "perc_below_150_percent_poverty",
  "perc_hh_no_vehicle",
  "percent_of_overcrowded_housing_units",
  "perc_w_o_high_school_diploma",
  "perc_bipoc_residents"
)

# Most Socially Vulnerable Tracts
## Calculate the 75th percentile for each indicator
upper_percentiles <- wake_tbl_clean %>%
  summarise(across(all_of(columns_of_interest), ~ quantile(.x, 0.75, na.rm = TRUE)))

## Identify tracts where all values are in the 75th percentile or greater
highest_quartile_tracts <- wake_tbl_clean %>%
  filter(
    perc_below_150_percent_poverty >= upper_percentiles$perc_below_150_percent_poverty &
    perc_hh_no_vehicle >= upper_percentiles$perc_hh_no_vehicle &
    percent_of_overcrowded_housing_units >= upper_percentiles$percent_of_overcrowded_housing_units &
    perc_w_o_high_school_diploma >= upper_percentiles$perc_w_o_high_school_diploma &
    perc_bipoc_residents >= upper_percentiles$perc_bipoc_residents
  )

## Merge the highest quartile tracts with the shapefile
highest_quartile_tracts_geo <- wake_tracts %>%
  inner_join(highest_quartile_tracts, by = c("GEOID" = "fips_code"))

# Least Socially Vulnerable Tracts
## Calculate the 25th percentile for each indicator
bottom_percentiles <- wake_tbl_clean %>%
  summarise(across(all_of(columns_of_interest), ~ quantile(.x, 0.25, na.rm = TRUE)))

## Identify tracts where all values are in the 25th percentile or lower
lowest_quartile_tracts <- wake_tbl_clean %>%
  filter(
    perc_below_150_percent_poverty <= bottom_percentiles$perc_below_150_percent_poverty &
    perc_hh_no_vehicle <= bottom_percentiles$perc_hh_no_vehicle &
    percent_of_overcrowded_housing_units <= bottom_percentiles$percent_of_overcrowded_housing_units &
    perc_w_o_high_school_diploma <= bottom_percentiles$perc_w_o_high_school_diploma &
    perc_bipoc_residents <= bottom_percentiles$perc_bipoc_residents
  )

## Merge the lowest quartile tracts with the shapefile
lowest_quartile_tracts_geo <- wake_tracts %>%
  inner_join(lowest_quartile_tracts, by = c("GEOID" = "fips_code"))

Visualize the Most and Least Socially Vulnerable Communities in Wake County

The most socially vulnerable areas are census tracts in the top 25% for rates of no high school diploma (age 25+), individuals below 150% FPL, overcrowded housing, households without a vehicle, and BIPOC residents. The least socially vulnerable areas are in the bottom 25% for these indicators.

Code

# Create layered map with service zones, tracts, and most/least socially vulnerable areas
## Create base map
base_map <- ggplot(data = wake_service_zones_sf) +
  geom_sf(aes(geometry = geometry), fill = "white", color = "gray", size = 0.2) +
  theme_map()

## Add the most and least vulnerable tracts layer
map_top_tracts <- base_map +
  geom_sf(data = highest_quartile_tracts_geo, aes(fill = "Most Vulnerable"), color = "gray", alpha = .75) +
  geom_sf(data = lowest_quartile_tracts_geo, aes(fill = "Least Vulnerable"), color = "gray", alpha = .75) +
  scale_fill_manual(
    name = "Tract-Level Social Vulnerability",
    values = c("Least Vulnerable" = "#2c7fb8", "Most Vulnerable" = "#edf8b1"),
    labels = c("Least Vulnerable Areas", "Most Vulnerable Areas")
  )

## Add the service zones layer
zones <- wake_service_zones_sf %>%
  group_by(zone_name) %>%
  summarize(geometry = st_union(geometry))

map_with_zones <- map_top_tracts +
  geom_sf(data = zones, fill = NA, color = "black", size = 20)

## Add the service zone label layer
zone_centroids <- zones %>%
  st_centroid()

map_with_labels <- map_with_zones +
  geom_sf_text(data = zone_centroids, aes(label = zone_name), size = 3, color = "black", fontface = "bold") +
  labs(
    title = "Spatial Disparities in Social Vulnerability Factors in Wake County",
    caption = caption) +
  theme(
    legend.position = "right",
    legend.direction = "vertical",
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 10),
    plot.title = element_text(size = 12, face = "bold"),
    plot.caption = element_text(size = 10, face = "italic", hjust=0, vjust=1))

map_with_labels

Key Insight

Wake County is divided into 8 service zones. This map shows that the most socially vulnerable communities are clustered together, concentrated in the East Central service zone and nearby tracts on the borders of the North Central, South Central, and West Central service zones. On the other hand, the least socially vulnerable communities are more distinctly spread across the county in the Northern, North Central, West, and South Central service zones. We can use this information to help inform which tracts and service zones county-level investments should go to.

Investigate Alignment Between Opportunity Zones and Socially Vulnerable Communities

Established by Congress in 2017 through the Tax Cuts & Jobs Act, Opportunity Zones are a nationwide incentive program to encourage long-term investment and economic growth in communities that have experienced historical disinvestment. The Opportunity Zones Program currently defines eligible census tracts as areas where the poverty rate is 20% or higher and/or family income is less than 80% of the area’s median income. In Raleigh, there are 10 census tracts designated as Opportunity Zones.

The shaded areas in this map are the most socially vulnerable areas, or census tracts in the top 25% for rates of no high school diploma (age 25+), individuals below 150% FPL, overcrowded housing, households without a vehicle, and BIPOC residents. Among these tracts, green areas are designated Opportunity Zones while gray areas are not.

Code

highest_quartile_tracts_geo_oz <- highest_quartile_tracts_geo %>%
  mutate(oz = ifelse(
    NAME %in% c(
      '540.18','527.04','540.08','527.06','524.07','509', '545', '530.09', '528.03', '528.07'
      ), "1", "0"))

# Create layered map with service zones, tracts, and most/least socially vulnerable areas
## Create base map
base_map <- wake_service_zones_sf %>%
  filter(zone_name %in% c('East\nCentral', 'North Central', 'West Central', 'South Central')) %>%
  ggplot() +
  geom_sf(aes(geometry = geometry), fill = "white", color = "gray", size = 0.2) +
  theme_map()

## Add the most vulnerable tracts layer with Opportunity Zones highlighted
map_oz <- base_map +
  geom_sf(data = highest_quartile_tracts_geo_oz, aes(fill = oz), color = "black", alpha = .75) +
  scale_fill_manual(
    name = "Opportunity Zone Status",
    values = c("1" = "#2ca25f", "0" = "gray"),
    labels = c("Not Opportunity Zone", "Opportunity Zone")
  )

## Add the service zones layer
zones <- wake_service_zones_sf %>%
  filter(zone_name %in% c('East\nCentral', 'North Central', 'West Central', 'South Central', 'East Central')) %>%
  group_by(zone_name) %>%
  summarize(geometry = st_union(geometry))

map_with_zones <- map_oz +
  geom_sf(data = zones, fill = NA, color = "black", size = 20)

## Add the service zone label layer
zone_centroids <- zones %>%
  st_centroid()

map_with_labels <- map_with_zones +
  geom_sf_text(data = zone_centroids, aes(label = zone_name), size = 3, color = "black", fontface = "bold") +
  labs(
    title = "Opportunity Zones Among Top Socially Vulnerable Areas in Wake County",
    caption = "CDC/ATSDR Social Vulnerability Index, City of Raleigh Opportunity Zones") +
  theme(
    legend.position = "right",
    legend.direction = "vertical",
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 10),
    plot.title = element_text(size = 12, face = "bold"),
    plot.caption = element_text(size = 10, face = "italic", hjust=0, vjust=1))

map_with_labels

Key Insight

Of the 10 tracts identified as the most socially vulnerable areas, 4 are designated Opportunity Zones by the city of Raleigh. This overlap suggests the initiative is targeting areas with high social vulnerability, but could do better. Expanding Opportunity Zone tract eligibility beyond poverty and income measures could lead to improved social vulnerability among communities that have been historically marginalized. Policymakers should consider additional investment incentive programs for communities with lower rates of educational attainment and higher rates of severe housing problems, for example.

Citations

Code

# Add packages used to bibliography
cite_packages(output = "table", out.dir = ".") %>% kable()

Package	Version	Citation
base	4.3.0	R Core Team (2023)
expss	0.11.6	Demin (2023)
GGally	2.2.1	Schloerke et al. (2024)
ggmap	4.0.1	Kahle and Wickham (2013)
ggthemes	5.1.0	Arnold (2024)
janitor	2.2.0	Firke (2023)
knitr	1.42	Xie (2014); Xie (2015); Xie (2023)
labelled	2.11.0	Larmarange (2023)
openxlsx	4.2.8	Schauberger and Walker (2025)
rmarkdown	2.21	Xie, Allaire, and Grolemund (2018); Xie, Dervieux, and Riederer (2020); Allaire et al. (2023)
scales	1.4.0	Wickham, Pedersen, and Seidel (2025)
sf	1.0.20	Pebesma (2018); Pebesma and Bivand (2023)
tidycensus	1.7.1	Walker and Herman (2025)
tidyverse	2.0.0	Wickham et al. (2019)
tigris	2.2.1	Walker (2025)

Code

#knitr::kable(pkgs)

References

Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2023. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.

Arnold, Jeffrey B. 2024. ggthemes: Extra Themes, Scales and Geoms for “ggplot2”. https://CRAN.R-project.org/package=ggthemes.

Demin, Gregory. 2023. expss: Tables, Labels and Some Useful Functions from Spreadsheets and “SPSS” Statistics. https://CRAN.R-project.org/package=expss.

Firke, Sam. 2023. janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.

Kahle, David, and Hadley Wickham. 2013. “ggmap: Spatial Visualization with Ggplot2.” The R Journal 5 (1): 144–61. https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf.

Larmarange, Joseph. 2023. labelled: Manipulating Labelled Data. https://CRAN.R-project.org/package=labelled.

Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.

Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With applications in R. Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.

R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Schauberger, Philipp, and Alexander Walker. 2025. openxlsx: Read, Write and Edit Xlsx Files. https://CRAN.R-project.org/package=openxlsx.

Schloerke, Barret, Di Cook, Joseph Larmarange, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg, and Jason Crowley. 2024. GGally: Extension to “ggplot2”. https://CRAN.R-project.org/package=GGally.

Walker, Kyle. 2025. tigris: Load Census TIGER/Line Shapefiles. https://CRAN.R-project.org/package=tigris.

Walker, Kyle, and Matt Herman. 2025. tidycensus: Load US Census Boundary and Attribute Data as “tidyverse” and “sf”-Ready Data Frames. https://CRAN.R-project.org/package=tidycensus.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Thomas Lin Pedersen, and Dana Seidel. 2025. scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.

Xie, Yihui. 2014. “knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.

———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.

———. 2023. knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.

Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.

Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.