Abstract

This study is a reproduction of Charles W. Sterling III et al’s study on “Connections Between Present-Day Water Access and Historical Redlining”.

Sterling III, Charles W., et al. “Connections between present-day water access and historical redlining.” Environmental Justice (2023). DOI:[10.1089/env.2022.0115](https://doi.org/10.1089/env.2022.0115)

This study uses ACS Census data and historical HOLC records of neighborhoods to examine the correlation of historical redlining and current-day access to water in cities. The study uses a binary logistic regression to identify relationships of different demographic and HOLC varaibles to access to water and sewage. The study finds that historically worse HOLC scores were correlated with less access to water in cities across all regions of the United States.

Original study spatio-temporal metadata

  • Spatial Coverage: Specific inner-cities that are included in the MID HOCL Grade dataset across the United States. Data is limited to post- WW2 inner core census tracts due to the boundaries of HOLC Zones.
  • Spatial Resolution: resolution of original study
  • Spatial Reference System: spatial reference system of original study
  • Temporal Coverage: temporal extent of original study
  • Temporal Resolution: temporal resolution of original study

Study design

Describe how the study relates to prior literature, e.g. is it a original study, meta-analysis study, reproduction study, reanalysis study, or replication study?

Also describe the original study archetype, e.g. is it observational, experimental, quasi-experimental, or exploratory?

Enumerate specific hypotheses to be tested or research questions to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question.

This study is a reproduction of Charles W. Sterling III et al’s study on “Connections Between Present-Day Water Access and Historical Redlining”, a study which set out to explore the relationship between prevalence of the complete plumbing ACS variable in census blocks and historical neighborhood HOLC grades. In the study, Sterling hypothesizes that there will be a link between HOLC grade and complete plumbing. This hypothesis is based on the HOLC’s well documented history of assigning neighborhoods of color the lowest grades and previous findings of a nationwide link between race and complete plumbing prevalance as found in Deitz and Meehan 2019. Sterling’s work uses similar method to this study, using a logistic regression to test the link between the neighborhood grades and plumbing.

Other methods included in this study include the use of Areal Interpolation to supposedly assign ACS variable characteristics to the HOLC graded neighborhoods. This is somewhat of a strange strategy; often used to fill in missing or anamolous values in gridded data, areal interpolation doesn’t make a lot of sense for something like assigning data to an irregularly shaped polygon. Furthermore, areal interpolation assigns a value that is a average of several values of the areal units on either side of it. This suggests that the phenomena to be studied in the filled in area is a continuation of patterns that are present proximal to it. This is almost the opposite of the point of HOLC grades, grades assigned to neighborhoods based on their characteristics that set them apart. There’s a somewhat simple explanation to why Sterling chose to use this approach; The study that Sterling cites as the published usage of this technique for re-aggregating HOLC data - Fricker and Allen 2022 - used area-weighted-reaggregation to re-assign casualties from tornado paths to the neighborhoods they passed through, but incorrectly used the term ‘areal interpolation’ to describe their method. This means that likely Sterling also just used AWR. This assumption is supported by an interrogation of the python package used for the reaggregation, which considers AWR “the simplest form of areal interpolation”, a somewhat misleading designation given that the two refer to different techniques.

Materials and procedure

Computational environment

This reproduction uses the following R packages

# record all the packages you are using here
# this includes any calls to library(), require(),
# and double colons such as here::i_am()
packages <- c("tidyverse", "here", "tidycensus", "margins", "sf", "tigris", "rstatix", "MASS", "knitr", "kableExtra", "rstatix", "ggcorrplot")

Data and variables

This study used only two data sources; American Community Survey (ACS) demographic data and maps of neighborhood’s historical mortgage HOLC grades, published and maintained on the University of Richmond’s Mapping Inequality Database.

American Community Survey (ACS)

  • Title: American Community Survey (ACS)
  • Abstract: The ACS is a nationwide survey that collects and produces information on social, economic, housing, and demographic characteristics about our nation’s population every year
  • Spatial Coverage: United States of America
  • Spatial Resolution: ACS data is aggregated by Census Block.
  • Spatial Representation Type: vector, MULTIPOLYGON
  • Spatial Reference System: NAD83
  • Temporal Coverage: ACS data from 2016-2020.
  • Temporal Resolution: Data is collected annually.
  • Lineage: Downloaded and used as-is.
  • Distribution: Data is publically avalible from the US Census (https://www.census.gov/programs-surveys/acs/data.html)[https://www.census.gov/programs-surveys/acs/data.html]
  • Constraints: Publicly available data.
  • Data Quality: N/A

See acs_values.csv at /data/raw/public/acs_values.csv

## # A tibble: 30 × 3
##    name       label                                concept                      
##    <chr>      <chr>                                <chr>                        
##  1 B01003_001 Estimate!!Total                      TOTAL POPULATION             
##  2 B02008_001 Estimate!!Total:                     WHITE ALONE OR IN COMBINATIO…
##  3 B02009_001 Estimate!!Total:                     BLACK OR AFRICAN AMERICAN AL…
##  4 B02010_001 Estimate!!Total:                     AMERICAN INDIAN AND ALASKA N…
##  5 B02011_001 Estimate!!Total:                     ASIAN ALONE OR IN COMBINATIO…
##  6 B03003_001 Estimate!!Total:                     HISPANIC OR LATINO ORIGIN    
##  7 B03003_003 Estimate!!Total:!!Hispanic or Latino HISPANIC OR LATINO ORIGIN    
##  8 B05012_001 Estimate!!Total:                     NATIVITY IN THE UNITED STATES
##  9 B05012_003 Estimate!!Total:!!Foreign-Born       NATIVITY IN THE UNITED STATES
## 10 B25003_001 Estimate!!Total:                     TENURE                       
## # ℹ 20 more rows

Unplanned Deviation This study uses block group-level data for the foreign-born population. However, the ACS datasets have no information about place-of-birth at this level. Instead, data for foreign-born population percentage is gathered at the tract level.

HOLC Grades

Title: HOLC grades from the Mapping Inequality Database (MID) - Abstract: HOLC grades are historical neighborhood delineations created by the Home Owners’ Loan Corporation in the 1930s to assess mortgage lending risk. These maps were used to guide investment and disinvestment in urban areas and are a foundational dataset for understanding redlining and its long-term impacts. - Spatial Coverage: Select major U.S. cities - Spatial Resolution: Neighborhood-scale boundaries (city block to district level, variable by city) - Spatial Representation Type: vector, MULTIPOLYGON - Spatial Reference System: NAD83 - Temporal Coverage: 1935–1940 (approximate dates of HOLC map creation) - Temporal Resolution: Single mapping effort. - Lineage: Downloaded and used as-is. - Distribution: Data is avalible on the Mapping Inequality website (https://dsl.richmond.edu/panorama/redlining/data)[https://dsl.richmond.edu/panorama/redlining/data] - Constraints: Publicly available data under a CC-by-NC 2.5 license - Data Quality: Accurate to the degree that the original digitization of the 1930’s HOLC maps was accurate.

Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
city City Name City in which the HOLC designated area resides character N/A Names of US cities none none
state State Abbreviation USPS state designator character N/A The 50 states N/A none
category Desirability Assessment of investment desirability based on HOLC designation character subjective/historical a range of desigations none none
grade Grade Historical HOLC grade character subjective/historical A - D none none
label Shape Label HOLC grade and arbitrary number (A1, C3) to divide same-grade areas from the same city character N/A A-D, number of same-grade zones in the same city none none
residential Is it residential? ” ” binary (TRUE/FALSE) N/A TRUE/FALSE none none
commercial Is it commercial? ” ” binary (TRUE/FALSE) N/A TRUE/FALSE none none
industrial Is it industrial? ” ” binary (TRUE/FALSE) N/A TRUE/FALSE none none

Unplanned Deviation Not all HOLC polygons have valid geometry, so invalid geometry was fixed by splitting to single parts before performing the intersects function

## Warning in st_cast.sf(holc_polygons_bugged, "POLYGON"): repeating attributes
## for all sub-geometries for which they may not be constant

In order to first save data, the tract and census block data should first be intersected by these HOLC polygons

## `summarise()` has grouped output by 'area_id', 'city', 'state', 'city_survey',
## 'category', 'grade', 'label', 'residential', 'commercial', 'industrial'. You
## can override using the `.groups` argument.
## Reading layer `acs_censusblock' from data source 
##   `/Users/lucas/Documents/GitHub.nosync/RPr-Sterling-2023/data/raw/public/acs_censusblock.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 49449 features and 72 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -122.8334 ymin: 25.69572 xmax: -69.56878 ymax: 48.28347
## Geodetic CRS:  NAD83
## Reading layer `acs_tract' from data source 
##   `/Users/lucas/Documents/GitHub.nosync/RPr-Sterling-2023/data/raw/public/acs_tract.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 18473 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -122.8676 ymin: 25.69572 xmax: -69.49354 ymax: 48.28347
## Geodetic CRS:  NAD83

Prior observations

There are no prior observations related with this data.

Bias and threats to validity

Three main source of bias in this study come from the HOLC data itself. Firstly, the spatial extent of the HOLC data only covers the centers of major cities across the united states, limiting the analysis to center-cities. Secondly, ‘Areal Interpolation’ was used to determine census data values for each HOLC neighborhood, a method that doesn’t seem logical or advisable give the study context. Some data in the study suggests that this actually was a more logical area weighted reaggregation. Choosing individual HOLC polygons as the study’s resolution increases the threat to validity through the Modifiable Areal Unit Problem. Many of the polygons represent outdated neighborhood lines, and are limited only to populated areas from that time period. This affects the data being used to represent each neighborhood today, especially considering that area-weighted reaggregation is being used.

Data transformations

First, remap ACS values from block groups to HOLC polygons through areal interpolation, likely actually area-weighted reaggregation.

## Reading layer `acs_censusblock_intersected' from data source 
##   `/Users/lucas/Documents/GitHub.nosync/RPr-Sterling-2023/data/raw/public/acs_censusblock_intersected.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 102515 features and 87 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -122.7675 ymin: 25.70537 xmax: -69.60044 ymax: 48.2473
## Geodetic CRS:  NAD83

After performing AWR on the HOLC polygons, we are left with significantly less geometries than we started with (10154 originally vs. 9614 after re-aggregation). This will come into play later down the road.

Using census statistics from the ACS dataset, resulting calculated assigned variables are: Black Population %, White Population %, Indigenous American %, Asian Population %, Hispanic or Latino Population %, % Below Poverty Line, % Housing Units Owned, % Housing Units Rented, % Mobile Homes, % Homes before 1980, % Homes after 1980, % With Complete Plumbing, % With Incomplete Plumbing, and % Foreign Born. This data is separated by region as well as collected nationally. Using regional and national averages for plumbing rates, a binary variable is calculated on whether plumbing is below or above the average.

HOLC_polygons <- HOLC_polygons |>
  mutate(
    water_access = case_when(
      `ic_plumbE` > 2.61668 ~ 1,
      TRUE ~ 0
    )
  ) #assigning a binary variable to plumbing access- 1 if the incomplete plumbing percentage is higher than the national average, 0 if it is lower than the national average. 


data_clean <- HOLC_polygons |>
  mutate(
    holc_id = area_id,
    holc_grade = grade,
    black_pct = (black_popE / total_pop_raceE) * 100,
    white_pct = (white_popE / total_pop_raceE) * 100,
    indig_pct = (ind_popE / total_pop_raceE) * 100,
    asian_pct = (asian_popE / total_pop_raceE) * 100,
    hispan_Latino_pct = (hisp_popE / total_pop_hispE) * 100,
    pct_below_poverty = (bp_popE / total_pop_plE) * 100,
    pct_home_ownership = (total_ownE / total_hu_tenE) * 100,
    pct_renters = (total_rentE / total_hu_tenE) * 100,
    pct_mobile_homes = (total_mhE / total_hu_mhE) * 100,
    pct_pre_1980 = ((age_1939E + age_1940E + age_1950E + age_1960E + age_1970E) / total_hu_ageE) * 100,
    pct_post_1980 = ((age_1980E + age_1990E + age_2000E + age_2010E + age_2014E) / total_hu_ageE) * 100,
    pct_complete_plumb = (c_plumbE / total_hu_plumbE) * 100,
    pct_incomplete_plumb = (ic_plumbE / total_hu_plumbE) * 100,
    water_access = water_access
  )

Data will now be verified using the supplementary tables by subtracting the national summary statistics table from the calculated table, which should result in zeros across the board.

This is somewhat of a test of whether the authors actually did in fact use area weighted reaggregation or areal interpolation; if they used AWI, our results should more or less match, aside from the differing numbers of HOLC polygons remaining in the analysis. Additionally, because the reaggregation was done in a Python package this will test the similarities/differences of our implementation.

The total number of observations in the summary statistics tables is 8878, 736 less than our AWI results. While we lost some polygons through the scaling, the original authors, using a python package, lost many more than we did. It’ll be interesting to see how different results are, as this is likely the cause of much of the potential difference between the datasets.

black_pct_diff white_pct_diff indig_pct_diff asian_pct_diff hispan_Latino_pct_diff pct_below_poverty_diff pct_home_ownership_diff pct_renters_diff pct_mobile_homes_diff pct_pre_1980_diff pct_post_1980_diff pct_complete_plumb_diff pct_incomplete_plumb_diff
8.860255 10.38488 1.137381 3.257141 5.387422 6.488319 13.09047 13.18406 0.6540667 7.271628 7.204174 2.144397 1.780241

Then, use the Dunn Test in the to determine whether incomplete plumbing % in Grade A is significantly different than the other Grades and compared with supplementary table 1.

Z P.unadj P.adj
8.341575 -0.0003491 -0.0020289
25.456331 0.0000000 0.0000000
22.150445 0.0000000 0.0000000
38.116993 0.0000000 0.0000000
37.871252 0.0000000 0.0000000
19.708821 0.0000000 0.0000000

The Dunn test result is significanty different, with the test Z statistic consistently negative in the original Dunn results and consistently positive, almost to the same degree. All of the pairwise differences are still statistically significant, which matches the findings of the original authors.

Analysis

Sterling III’s original study used a binary logistic regression model to compare categorical hold grades and a census block group’s binary desigantion of whether plumbing rates were above or below average.

Using a stepAIC function within the \(MASS\) package and the HOLC Grade A as a reference group, we can remove microcolinearity for the final regression analysis. In this case, high correlation pair values will be removed to match the original methodology, a visual of the correlation table is also available.

black_pct white_pct indig_pct asian_pct hispan_Latino_pct pct_below_poverty pct_renters pct_mobile_homes pct_post_1980 pct_incomplete_plumb holc_gradeA holc_gradeB holc_gradeC holc_gradeD
black_pct 1.0000000 -0.8698584 -0.0426595 -0.2700047 -0.1852961 0.4280641 0.2735503 -0.0134052 -0.0123755 0.4514905 -0.1314971 -0.1147203 0.0490533 0.1554326
white_pct -0.8698584 1.0000000 0.0007625 -0.0871363 -0.0961478 -0.3741649 -0.3558496 0.0284169 -0.0263567 -0.3674112 0.1672536 0.1275965 -0.0853035 -0.1754799
indig_pct -0.0426595 0.0007625 1.0000000 -0.0379227 0.0958943 0.1150724 0.1062556 0.0744332 0.0173740 0.0312798 -0.0452279 -0.0627824 -0.0112927 0.0283662
asian_pct -0.2700047 -0.0871363 -0.0379227 1.0000000 0.0312383 -0.1902106 0.0381656 -0.0718153 0.1166294 -0.1904061 0.0108093 0.0431247 0.0233518 -0.0307498
hispan_Latino_pct -0.1852961 -0.0961478 0.0958943 0.0312383 1.0000000 0.0488876 0.2275354 0.0287248 -0.0055973 -0.0861892 -0.1181908 -0.0641764 0.0760293 0.1038673
pct_below_poverty 0.4280641 -0.3741649 0.1150724 -0.1902106 0.0488876 1.0000000 0.5889550 0.0598075 0.0820857 0.3691761 -0.1990502 -0.1826611 0.0271861 0.2220992
pct_renters 0.2735503 -0.3558496 0.1062556 0.0381656 0.2275354 0.5889550 1.0000000 -0.0250379 0.2541901 0.1306942 -0.2698780 -0.1901436 0.0738958 0.2218243
pct_mobile_homes -0.0134052 0.0284169 0.0744332 -0.0718153 0.0287248 0.0598075 -0.0250379 1.0000000 0.1079025 0.0659484 -0.0509174 -0.0633809 -0.0000382 0.0587096
pct_post_1980 -0.0123755 -0.0263567 0.0173740 0.1166294 -0.0055973 0.0820857 0.2541901 0.1079025 1.0000000 -0.1139085 -0.0851896 -0.1320052 -0.0555425 0.1750454
pct_incomplete_plumb 0.4514905 -0.3674112 0.0312798 -0.1904061 -0.0861892 0.3691761 0.1306942 0.0659484 -0.1139085 1.0000000 -0.0979310 -0.1041727 0.0330310 0.1355664
holc_gradeA -0.1314971 0.1672536 -0.0452279 0.0108093 -0.1181908 -0.1990502 -0.2698780 -0.0509174 -0.0851896 -0.0979310 1.0000000 -0.1971853 -0.2590029 -0.1783255
holc_gradeB -0.1147203 0.1275965 -0.0627824 0.0431247 -0.0641764 -0.1826611 -0.1901436 -0.0633809 -0.1320052 -0.1041727 -0.1971853 1.0000000 -0.4296020 -0.2957843
holc_gradeC 0.0490533 -0.0853035 -0.0112927 0.0233518 0.0760293 0.0271861 0.0738958 -0.0000382 -0.0555425 0.0330310 -0.2590029 -0.4296020 1.0000000 -0.3885127
holc_gradeD 0.1554326 -0.1754799 0.0283662 -0.0307498 0.1038673 0.2220992 0.2218243 0.0587096 0.1750454 0.1355664 -0.1783255 -0.2957843 -0.3885127 1.0000000
## Start:  AIC=8733.5
## pct_incomplete_plumb ~ holc_grade + (holc_grade + black_pct + 
##     white_pct + indig_pct + asian_pct + hispan_Latino_pct + pct_below_poverty + 
##     pct_renters + pct_mobile_homes + pct_post_1980)
## 
##                     Df Deviance    AIC
## - pct_renters        1   8695.5 8731.5
## - indig_pct          1   8696.1 8732.1
## <none>                   8695.5 8733.5
## - white_pct          1   8704.9 8740.9
## - hispan_Latino_pct  1   8732.0 8768.0
## - pct_mobile_homes   1   8732.7 8768.7
## - holc_grade         9   8766.5 8786.5
## - black_pct          1   8781.6 8817.6
## - pct_post_1980      1   8781.9 8817.9
## - asian_pct          1   8857.7 8893.7
## - pct_below_poverty  1   9098.7 9134.7
## 
## Step:  AIC=8731.52
## pct_incomplete_plumb ~ holc_grade + black_pct + white_pct + indig_pct + 
##     asian_pct + hispan_Latino_pct + pct_below_poverty + pct_mobile_homes + 
##     pct_post_1980
## 
##                     Df Deviance    AIC
## - indig_pct          1   8696.1 8730.1
## <none>                   8695.5 8731.5
## + pct_renters        1   8695.5 8733.5
## - white_pct          1   8704.9 8738.9
## - hispan_Latino_pct  1   8732.7 8766.7
## - pct_mobile_homes   1   8732.9 8766.9
## - holc_grade         9   8767.3 8785.3
## - black_pct          1   8781.8 8815.8
## - pct_post_1980      1   8783.7 8817.7
## - asian_pct          1   8860.2 8894.2
## - pct_below_poverty  1   9149.9 9183.9
## 
## Step:  AIC=8730.08
## pct_incomplete_plumb ~ holc_grade + black_pct + white_pct + asian_pct + 
##     hispan_Latino_pct + pct_below_poverty + pct_mobile_homes + 
##     pct_post_1980
## 
##                     Df Deviance    AIC
## <none>                   8696.1 8730.1
## + indig_pct          1   8695.5 8731.5
## + pct_renters        1   8696.1 8732.1
## - white_pct          1   8705.4 8737.4
## - hispan_Latino_pct  1   8732.7 8764.7
## - pct_mobile_homes   1   8734.1 8766.1
## - holc_grade         9   8768.4 8784.4
## - black_pct          1   8782.2 8814.2
## - pct_post_1980      1   8784.6 8816.6
## - asian_pct          1   8860.2 8892.2
## - pct_below_poverty  1   9155.8 9187.8

Sterling calculates Average Marginal Effects (AME) using the \(margins\) package to gather different census-derived variables represented as binary values. Here, the results of the reproduction’s model are compared with that of the original model.

Unplanned Deviation The average marginal effects results have some factors that are not designed to be used, with holc grades of A, E, F, and other variables that seem to end with a space. Additionally, reducing microcolinearity seemed to remove the indigenous population variable, rather than the white population variable.

factor AME original_AME difference
asian_pct -0.1392510 NA NA
black_pct 0.1191453 NA NA
hispan_Latino_pct -0.0563915 NA NA
holc_gradeB 0.0022993 NA NA
holc_gradeC 0.0507086 NA NA
holc_gradeD 0.1022529 NA NA
pct_below_poverty 0.1887591 NA NA
pct_mobile_homes 0.0629468 NA NA
pct_post_1980 -0.0854183 NA NA

Here the difference is very noticeable among race/ethnicity related data and poverty line data, but among all other factors, the difference remains very small. Notable, the AME values are very similar across HOLC values.

Results

Data is presented as 3 figures, 2 tables, and 7 supplementary tables. We’ll reproduce each figure and table and note differences between our reproduction and the figures from the original manuscript.

Figures:

  • Map of HOLC grades for different cities across regions. Sterling presents maps of St. Louis, Buffalo, Oakland, and Birmingham. We choose to only recreate one of the four, St. Louis, trusing that if it matches others also likely do too.

This map is somewhat different than Sterling’s map, revealing one of the potential reasons behind differing values in our AME and other various comparisons. A different set of HOLC polygons is mapped in this version than in Sterling’s results, but despite us ending up with MORE polygons in our analysis total, we seem to be missing several of the holc zones/sections of the map.

We will map another city to test and see if this is a widespread phenomenon.

The map of buffalo is much more intact, with only one or two sections across the city missing. The source of this loss is random- it could have to do with our multipart to single part transformation, the removal of broken geometries (of which we had more in r than in ArcGIS), or somewhere in our AWR step. Given that this effected multiple cities, it’s likely that it is a widespread issue.

  • Correlation plot for potential predictive variables

This finds the same results as the original study, with the only two extremely correlated variables being the white and black population percentages.

  • Map of percent of houses with incomplete plumbing across regions

Aside from the missing HOLC polygons, the incomplete plumbing percentages appear to have similar percentages to the figures in the original manuscript.

Tables:

  • Average Marginal Effects of Variables determined by Binary Regression Model
Average Marginal Effects of Variables as Determined by Binary Regression Model
factor AME
asian_pct -0.1392510
black_pct 0.1191453
hispan_Latino_pct -0.0563915
holc_gradeB 0.0022993
holc_gradeC 0.0507086
holc_gradeD 0.1022529
pct_below_poverty 0.1887591
pct_mobile_homes 0.0629468
pct_post_1980 -0.0854183
  • The Number of Each Grade Polygon in Each Data Set
Stylized Table Example
HOLC Grade Number of Polygons Original Difference
A 1020 1040 -20
B 2367 2332 35
C 3465 3381 84
D 2037 2118 -81

Differences in the number of polygons per grade are seen, with between a 20 and 84 polygon difference between the two datasets.

Discussion

Sterling III hypothesizes that areas of present-day incomplete plumbing within U.S. cities (i.e., communities with a proportion of homes lacking complete plumbing above the national average) are significantly associated with HOLC neighborhood designations. The study finds evidence for this hypothesis, with lower HOLC grades correlated with higher levels of incomplete household plumbing. That being said, despite following the original study’s listed methods, we found numerically different results. The most likely cause of this is the differing intial HOLC geometries between our study and the original, as seen the first figure of our results section. We went to great lengths to repair the HOLC geometries in R, but ended up with an interesting subset of the polygons remaining. Geometry issues are obviously compounding, as census data aggregated into incorrect polygons will result in incorrect summary statistics. This seems to be an R/SF issue; when loading the geometries into either ArcGIS or QGIS there are only 2 geometries that need repairing. While we’re not sure exactly what caused this issue, it’s worth keeping in mind for future reproductions.

Integrity Statement

The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.

Acknowledgements

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

References

Sterling III, Charles W., et al. “Connections between present-day water access and historical redlining.” Environmental Justice (2023). DOI:[10.1089/env.2022.0115](https://doi.org/10.1089/env.2022.0115)

Shiloh Deitz and Katie Meehan. “Plumbing Poverty: Mapping Hot Spots of Racial and Geographic Inequality in U.S. Household Water Insecurity.” Annals of the American Association of Geographers 109 (2019): 1092–1109.

Tyler Fricker and Douglas L. Allen. “A Place-Based Analysis of Tornado Activity and Casualties in Shreveport, Louisiana.” Natural Hazards 113 (2022): 1853–1874.

Kassambara, Alboukadel. 2023a. Ggcorrplot: Visualization of a Correlation Matrix Using Ggplot2. http://www.sthda.com/english/wiki/ggcorrplot-visualization-of-a-correlation-matrix-using-ggplot2.
———. 2023b. Rstatix: Pipe-Friendly Framework for Basic Statistical Tests. https://rpkgs.datanovia.com/rstatix/.
Leeper, Thomas J. 2024. Margins: Marginal Effects for Model Objects. https://github.com/bbolker/margins.
Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://here.r-lib.org/.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
———. 2025. Sf: Simple Features for r. https://r-spatial.github.io/sf/.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With applications in R. Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ripley, Brian, and Bill Venables. 2025. MASS: Support Functions and Datasets for Venables and Ripley’s MASS. http://www.stats.ox.ac.uk/pub/MASS4/.
Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with s. Fourth. New York: Springer. https://www.stats.ox.ac.uk/pub/MASS4/.
Walker, Kyle. 2024. Tigris: Load Census TIGER/Line Shapefiles. https://github.com/walkerke/tigris.
Walker, Kyle, and Matt Herman. 2025. Tidycensus: Load US Census Boundary and Attribute Data as Tidyverse and Sf-Ready Data Frames. https://walker-data.com/tidycensus/.
Wickham, Hadley. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://tidyverse.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2025. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Zhu, Hao. 2024. kableExtra: Construct Complex Table with Kable and Pipe Syntax. http://haozhu233.github.io/kableExtra/.