Reproducing a Study
I have recently finished my final in my Open GIScience class involving trying to reproduce the work by Charles W. Sterling III studying Connections Between Present-Day Water Access and Historical Redlining 10.1089/env.2022.0115. The study used redlining polygons from the Mapping Inequality Database to determine if there was correlation between historic redlining and modern-day access to water, gathered through census plumbing data. The study found positive relationships between lower Home Owners’ Loan Corporation (HOLC) scores and higher rates of incomplete plumbing. While this is certainly the case across the United States, the study has many problems due to the modifiable areal unit problem (MAUP). These HOLC polygons only covered 1920 cities, which largely correspond to only downtown areas today, additionally, the demographics of each of these polygons have changed over time. Using a HOLC zone as the spatial resolution meant that tract data can be matched incorrectly to the actual demographics of each neighborhood.
Our reproduction had many issues along the way, including working with the large datasets necessary to complete the project, as well as the original study making some specific procedures hard to follow. Instead of switching between multiple programs (R, Python, ArcGIS), our reproduction decided to make an analysis plan all in R, which led to our major problems. The foremost of these problems had to do with the geometry of the HOLC polygons, which contained many errors in the sf package. The result of these errors was HOLC zones assigned with drastically different demographic statistics in some areas. Ultimately, our reproduction yielded the same conclusions relating lower HOLC scores to higher rates of incomplete plumbing, but other statistics remained different.
If you want to see an html summary of this project, click this link.