seeing-through-the-map-data Data processing, crosswalks, and supporting infrastructure for Seeing Through the Map: A Static Test of Classification, Measurement, and Proxy Logic.
Live app: MappingFood Contents ACS data pipeline
clean_acs_data.py / cleanacsdata(1).py — American Community Survey cleaning scripts cleaned_acs_data.csv — processed ACS output
Geographic crosswalks
HUDcrosswalkZIp_COUNTY.csv — HUD ZIP-to-county crosswalk excluded_sample_below_90_totratio.csv — ZIP-county allocations excluded for low total ratio
NANDA (Neighborhood Atlas)
NaNDA_UnFiltered.csv — raw NANDA data nanda_with_fips_filtered.csv — filtered with FIPS codes nanda_county_aggregated.csv — county-level aggregation
MUA / RUCC
MUA_DET.csv — Medically Underserved Area designations Ruralurbancontinuumcodes2023.xlsx / RUCC_Tertile.csv — Rural-Urban Continuum Codes, tertile-stratified CopyofRUCC_StrataReady.csv — RUCC prepared for stratified sampling
County sampling
random_county_sample.csv / .xls / _edit.csv / _copy.csv — iterations of the county sampling frame. The project originally intended a multi-county random sample. Given the scope of the scoring framework and build-out under academic timelines, the final analysis was scoped to a single randomly selected county (Decatur). These files preserve the sampling process that preceded that decision.
Other
data_1.docx — working notes exportToHTML — HTML export artifacts requirements.txt — Python dependencies Notes This repository reflects the data pipeline as it developed — iteratively and in working order rather than as a cleaned final artifact. The sampling frame files show the evolution of the county selection process. The RUCC tertile stratification was used to ensure rural-urban representativeness in county selection.
Data sources: U.S. Census Bureau ACS, HUD USPS ZIP-County Crosswalk, NANDA, HRSA MUA designations, USDA RUCC 2023.