Mapping Housing Instability Across the United States

The Housing Precarity Risk Model (HPRM) 2.0

Introduction

The Housing Precarity Risk Model (HPRM) 2.0 offers a comprehensive view of housing instability across the United States at the census tract level.

Developed by the Urban Displacement Project, this tool combines two critical dimensions of housing risk:

  1. Estimated Displacement Risk (EDR): Measures the risk of displacement pressures related to the out-migration of lower-income households, often referred to as “soft displacement.”
  2. Estimated Eviction Risk (EER): Measures the risk associated with formal eviction filings, often termed “hard displacement.”

By integrating these two distinct models, HPRM 2.0 provides a nuanced understanding of neighborhood-level housing precarity. The goal of this tool is to equip policymakers, planners, community organizations, and residents with data to identify areas facing compounded housing challenges and to inform targeted interventions and policies.

This version of the model presents data based on the 2019 5-year American Community Survey (ACS) and includes an updated analysis using the 2022 5-year American Community Survey (ACS), allowing for comparisons over time. More details about the earlier version of this HPRM model and the underlying United States Estimated Displacement Risk (USEDR) can be found here.

Understanding the Components

HPRM 2.0 combines two distinct models to capture different facets of housing precarity:

Estimated Displacement Risk (EDR): The “Soft” Displacement

  • Concept: EDR focuses on the displacement pressures that arise from the net out-migration of lower-income households from a census tract over a five-year period. It seeks to capture the less visible forms of displacement where residents may leave due to rising costs or changing neighborhood character, even without a formal eviction. It specifically models migration patterns for households below 80% of the Area Median Income (AMI), broken down into three groups (Low: 50-80% AMI, Very Low: 30-50% AMI, Extremely Low: <30% AMI), using data from Data Axle.
  • Methodology: EDR is calculated based on a modeled net migration rate (dis_value) derived from BART (Bayesian Additive Regression Trees) models. Because migration patterns vary significantly across the US, separate models were developed for distinct geographic regions. Negative net migration values (more lower-income households leaving than arriving) indicate displacement pressure.
  • Risk Categories: Tracts are categorized into four risk levels based on the calculated dis_value:
    • Extreme Risk: Net migration less than -300 households. (dis_value < -300)
    • High Risk: Net migration between -300 and -100 households. (-300 <= dis_value < -100)
    • Elevated Risk: Net migration between -100 and 0 households. (-100 <= dis_value < 0)
    • At Risk/Early Stage: Net migration of 0 or more households. (dis_value >= 0)

Estimated Eviction Risk (EER): The “Hard” Displacement

  • Concept: EER focuses on the risk of formal eviction proceedings within a census tract. It aims to capture the more direct and immediate forms of displacement resulting from legal eviction filings.
  • Methodology: EER is based on a modeled eviction filing rate ratio (final_ev) derived from BART models, incorporating data from the Legal Services Corporation and other sources. Unlike the regional EDR models, the EER model covers a specific set of states where reliable eviction data was available.
  • Risk Categories: Tracts are categorized into four risk levels based on the calculated eviction filing rate ratio (final_ev):
    • Extreme Risk: Filing rate ratio of 2.0 or higher. (final_ev >= 2.0)
    • High Risk: Filing rate ratio between 1.5 and 2.0. (1.5 <= final_ev < 2.0)
    • Elevated Risk: Filing rate ratio between 1.0 and 1.5. (1.0 <= final_ev < 1.5)
    • At Risk: Filing rate ratio between 0.8 and 1.0. (0.8 <= final_ev < 1.0) (Note: Tracts with a ratio below 0.8 are considered to have lower or minimal eviction risk relative to the baseline and are not assigned a risk category in the final HPRM score calculation).

The Combined HPRM Score

The HPRM score provides a unified measure of housing precarity by combining the risks identified by the EDR and EER models.

  • Calculation: Each census tract receives a score from 1 to 4 for both EDR and EER based on its risk category (1 for At Risk/Early Stage, 2 for Elevated, 3 for High, 4 for Extreme). Tracts falling below the “At Risk” threshold for EER, or those excluded from EDR due to data quality issues (e.g., high student/retired populations, military bases), do not receive a score for that component. The final HPRM score (hprm_value) is the sum of the individual EDR and EER scores.
  • Interpretation: The HPRM score ranges from 0 to 8.
    • A score of 0 indicates areas with lower precarity (below thresholds) or insufficient data.
    • A score of 1 indicates that only one component (either EDR or EER) registered as “At Risk.”
    • A score of 8 indicates that the tract falls into the “Extreme Risk” category for both EDR and EER, signifying the highest level of combined housing precarity.
    • Higher scores generally indicate greater combined risk from both displacement pressures and eviction risk. Analyzing the individual EDR and EER scores alongside the combined HPRM score provides a more complete picture of the specific types of housing instability prevalent in a tract. For example, a tract might have a moderate HPRM score derived from high EDR but low EER, or vice versa.

Using the Map

The HPRM 2.0 is implemented as two separate interactive maps using Mapbox GL JS, with separate files for the 2019 and 2022 data:

  • Map Technology: Both maps use Mapbox GL JS (version 3.3.0) for rendering and interactivity.
  • Layers: Both maps allow you to switch between viewing:
    • HPRM Score (combined risk - default view)
    • Estimated Displacement Risk (EDR) component
    • Estimated Eviction Risk (EER) component
  • Year Selection: The 2019 and 2022 maps are presented as separate pages, rather than as a toggle within a single map.
  • Navigation: Use your mouse or trackpad to pan and zoom the map to explore different regions. The maps also offer location search, full-screen mode, and zoom controls.
  • Tract Information: Hovering over a census tract displays a popup with the risk level. The sidebar legend area also updates with detailed information for the tract, including:
    • Its GEOID and location (county, state)
    • The calculated HPRM score
    • Individual EDR and EER risk categories
    • Demographic data including renter count, rent burden percentage, median rent and income
    • Racial/ethnic composition of the neighborhood
  • Search: A search bar allows you to quickly locate specific addresses, cities, or counties.
  • Legend: A map legend explains the color coding used for the risk categories:
    • At Risk (yellow)
    • Elevated (orange)
    • High (red)
    • Extreme (purple)
  • 2022 Map Features: The 2022 map includes additional features not present in the 2019 version:
    • Welcome modal with introduction and navigation options
    • Information tooltips explaining each layer
    • Enhanced location finding capabilities

Note: The HPRM score in these maps ranges from 0 (lower precarity) to 8 (extreme precarity), where 0 might represent areas with insufficient data or areas below the minimum risk threshold.

Data Download

The HPRM 2.0 data is available for download to support further analysis, policy development, and community planning. While this functionality needs to be implemented, the following data should be made available:

Census Tract Level Data

  • HPRM Scores: Combined housing precarity risk scores (0-8)
  • EDR Metrics: Displacement risk category and underlying dis_value metrics
  • EER Metrics: Eviction risk category and underlying final_ev metrics
  • Demographic Data: Key demographic indicators including renter count, rent burden percentages, median rent and income
  • Geographic Identifiers: GEOID, county, and state for each tract

File Formats

  • CSV format for easy use in spreadsheet applications and data analysis tools
  • GeoJSON format with tract boundaries for use in GIS software

These data files are generated from the 2019 and 2022 American Community Survey (ACS) 5-year datasets, using the methodology described in this document. The downloadable data will enable researchers, policymakers, and community organizations to conduct their own analyses of housing precarity risks in their communities.

Methodology Deep Dive

The Housing Precarity Risk Model (HPRM) 2.0 employs advanced machine learning techniques and multiple data sources to predict housing instability at the census tract level. This section provides technical details about the methodology for researchers and practitioners interested in understanding the model’s construction.

Data Sources

The model integrates multiple administrative and survey datasets to capture the multifaceted nature of housing precarity:

  1. American Community Survey (ACS): 5-year estimates (2015-2019 for the 2019 model, 2018-2022 for the 2022 model) provide demographic, economic, and housing characteristics at the census tract level.
  2. Data Axle (formerly InfoGroup): Proprietary household-level migration data tracking residential moves over time, enabling calculation of net migration rates by income group.
  3. Legal Services Corporation: Court records data providing eviction filing rates for available states and counties.
  4. EPA Smart Location Database: Accessibility and urban form metrics including job access, walkability, and transit availability.
  5. Housing Choice Voucher (HCV) Data: HUD administrative data on subsidized housing participation rates.
  6. County-Level Voting Data: 2020 presidential election results aggregated to county level.
  7. Fair Market Rent (FMR) Data: HUD’s rent benchmarks by metropolitan area.

Geographic Framework

Due to significant regional variations in housing markets and migration patterns, the EDR model was developed separately for eight distinct regions:

  1. Midwest Great Lakes: Illinois, Ohio, Michigan, Indiana, Wisconsin, Minnesota
  2. Northeast New England: Massachusetts, Connecticut, Rhode Island, New Hampshire, Maine, Vermont, New York
  3. Northeast Penn-Washington: Pennsylvania, New Jersey, Maryland, Delaware, District of Columbia, Virginia
  4. South Atlantic: Florida, Georgia, North Carolina, South Carolina
  5. South Central: Texas, Oklahoma
  6. South Deep: Tennessee, Alabama, Louisiana, Kentucky, Mississippi, West Virginia, Arkansas, Missouri
  7. West Mountain Midwest: Iowa, Kansas, Nebraska, South Dakota, North Dakota, Colorado, Utah, Idaho, Montana, Wyoming, Arizona, Nevada, New Mexico
  8. West Pacific: California, Oregon, Washington

These regions were defined based on Combined Statistical Areas (CSAs) and similar housing market characteristics, ensuring that models capture region-specific dynamics.

Income Group Definitions

Both EDR and EER models focus on households below 80% of Area Median Income (AMI), stratified into three groups:

  • Extremely Low Income (EL): < 30% AMI
  • Very Low Income (VL): 30-50% AMI
  • Low Income (L): 50-80% AMI

These thresholds are calculated at the county level using ACS median household income data, then applied to tract-level income distributions to estimate the number of households in each category.

Analytical Workflow

The modeling process follows a systematic workflow across all regions:

Data Preparation (d-scripts)

  1. State Data Prep (d1): Process Data Axle migration data by state and calculate county-level median household incomes
  2. Small Area Estimation (d2): Apply smoothing techniques to stabilize migration estimates in low-population tracts
  3. Slope Exploration (d3): Analyze net migration rate trends across income groups
  4. Sample Size Validation (d4): Conduct power calculations to identify tracts with sufficient data quality
  5. ACS Processing (d5): Pull and process demographic and housing variables from ACS
  6. Data Merging (d6): Combine all data sources into unified modeling datasets
  7. Model Prep (d7): Final data curation and creation of outcome variables

Modeling Process (m-scripts)

  1. Variable Selection (m1): Initial selection of theoretically relevant predictors (100+ variables)
  2. BART Modeling (m2): Train Bayesian Additive Regression Trees models with cross-validation
  3. Variable Importance (m3): Evaluate predictor importance using permutation methods
  4. Model Iteration: Cycle through m1-m3, progressively refining variable sets to prevent overfitting
  5. Final Predictions (m4): Generate tract-level predictions using optimized models

BART Modeling Approach

Bayesian Additive Regression Trees (BART) was selected for its ability to handle complex data relationships while providing robust predictions.

  • Handle complex non-linear relationships
  • Automatically detect interactions between variables
  • Provide uncertainty quantification through posterior distributions
  • Avoid overfitting through built-in regularization

Let’s examine how each of these capabilities works in practice:

1. Handling Complex Non-linear Relationships

BART builds an ensemble of regression trees, where each tree captures different aspects of non-linearity in the data. Unlike single decision trees or linear models, BART can model complex curved relationships without manual feature engineering.

How it works in the code:

# From code/m2_bart_model_edr_v4.R
bart <- bartMachine(
    Xy = m_df,
    num_trees = nt,  # typically 200 trees
    k = k,           # controls tree depth
    nu = nu,         # degrees of freedom
    q = q            # quantile for prior
)

The num_trees parameter (typically 200 in the codebase) allows the model to capture multiple non-linear patterns. Each tree focuses on different regions of the feature space, and their sum creates highly flexible predictive surfaces.

Without this capability: A linear model would require manual creation of polynomial terms, interaction terms, and other transformations. Even then, it might miss complex regional patterns that BART automatically discovers. For example, the relationship between rent burden and displacement risk might be linear in expensive urban cores but highly non-linear in transitioning neighborhoods.

2. Automatically Detecting Interactions Between Variables

BART’s tree structure naturally captures interactions without explicitly specifying them. When a tree splits on one variable and then splits on another variable in a subsequent node, it’s modeling an interaction effect.

Evidence from the code:

# From code/archive/xnortheastatlantic/m2_bart_batch_v0_el.r.Rout
# The model achieves strong performance with 480 total features
# without manually specifying interactions:
"bartMachine after preprocess... 480 total features..."

The variable importance analysis in the code reveals which interactions matter most:

# From code/m2_bart_model.R
imp_permute <- var_selection_by_permute(bart, 
    bottom_margin = 10, 
    num_permute_samples = 10)
imp_permute$important_vars_local_names  # shows locally important interactions

Without this capability: Traditional regression would require manually specifying interaction terms (e.g., income × race, rent × employment), leading to an exponential explosion of parameters. With 100+ variables in these models, manually testing all possible interactions would be computationally infeasible.

3. Providing Uncertainty Quantification Through Posterior Distributions

BART is a Bayesian method that provides not just point predictions but full posterior distributions. This enables calculation of prediction intervals and uncertainty estimates.

Implementation in the code:

# From code/m5_eer_fits_v13_2022.r
# Calculate prediction intervals on national data
cpi <- calc_prediction_intervals(bm_model, rhs %>% select(names(bm_model$X)))

# Calculate margins of error
margins_of_error <- (cpi$interval[, "pi_upper_bd"] - cpi$interval[, "pi_lower_bd"]) / 2

# The results include uncertainty estimates
results <- tibble(
    fit = fit,
    log_moe = margins_of_error
) %>%
mutate(
    exp_fit = case_when(exp(fit)-1 <= 0 ~ 0, TRUE ~ exp(fit)-1),
    exp_moe = exp(log_moe)-1,
    lower_bound = exp_fit-exp_moe,
    upper_bound = exp_fit+exp_moe,
    p_diff = exp_moe/exp_fit  # proportion of uncertainty
)

The code also accesses posterior samples directly:

# From code/m3_var_importance_eer_v13.r
predictions[q, , ] <- bart_machine_get_posterior(bart_machine, test_data)$y_hat_posterior_samples

Without this capability: Point predictions alone can be misleading in policy contexts. A tract might have a predicted displacement risk of 100 households, but without uncertainty quantification, we wouldn’t know if the true value could reasonably range from 50 to 150 households. This uncertainty is crucial for resource allocation decisions.

4. Avoiding Overfitting Through Built-in Regularization

BART uses several regularization mechanisms controlled by hyperparameters that act as priors on the tree structure.

Key regularization parameters in the code:

# From cross-validation results in code/archive/xnortheastatlantic/m2_bart_batch_v0_el.r.Rout
# Optimal parameters found through CV:
"bartMachine CV win: k: 5 nu, q: 10, 0.75 m: 200"

These parameters work as follows:

  • k (typically 2-5): Controls tree depth. Larger k allows deeper trees but with a strong prior pulling them toward shallow trees. Acts like a complexity penalty.
  • nu (typically 3-10): Degrees of freedom for the error variance prior. Larger values provide tighter control on the variance.
  • q (typically 0.75-0.99): Quantile that controls how aggressively trees shrink toward zero. Values like 0.9 mean 90% of the prior probability mass is concentrated between -0.5 and 0.5 times the data range.

Evidence of regularization effectiveness:

# From code/m4_bart_diagnostics.r
# Multiple comments about overfitting checks:
"# Check for overfitting with in and out of sample data"
"# If a model performs well on the training set but poorly on unseen data, it may be overfitting."

The cross-validation results show that different combinations of these parameters lead to different levels of regularization:

# Models with too little regularization (k=2) perform worse:
"[16,] 2  3 0.90        50  117.5082         5.60494571"
"[17,] 2  3 0.99       200  118.6784         6.65657258"

Without this regularization:

  1. Without k: Trees would grow to full depth, memorizing training data
  2. Without nu: Error variance could explode, fitting noise
  3. Without q: Trees wouldn’t shrink toward zero, allowing extreme predictions
  4. Result: The model would achieve near-zero training error but fail catastrophically on new data

The regularization is why BART can handle datasets with 400+ features (as seen in the code) without the manual feature selection required by simpler methods.

Model Specifications in Practice

The codebase reveals how these specifications are implemented:

# From code/m2_bart_model.R
bartcv <- bartMachineCV(
    Xy = m_df,
    seed = seed,
    serialize = TRUE,
    use_missing_data = FALSE
)
# Extract optimal parameters
k = bartcv$k
nu = bartcv$nu
q = bartcv$q
nt = bartcv$num_trees

The cross-validation automatically balances all four capabilities:

  • More trees (better non-linearity) vs. computational cost
  • Deeper trees (more interactions) vs. overfitting risk
  • Tighter priors (more regularization) vs. model flexibility
  • Number of posterior samples vs. inference time

Key Predictors

Through iterative variable selection, the models identify important predictors including:

  • Housing cost metrics (rent burden, home values, rent changes)
  • Demographic composition (race, education, age structure)
  • Economic indicators (unemployment, income distribution, poverty)
  • Housing stock characteristics (age, size, tenure)
  • Neighborhood change indicators (gentrification, hot markets)
  • Accessibility measures (job access, transit availability)

This automated tuning is why BART performs well across diverse regional datasets without manual adjustment for each region’s unique characteristics.

Outcome Variable Construction

EDR: Net Migration Rate

The displacement risk metric (dis_value) represents the net migration of low-income households:

  • Calculated as: (in-migrants - out-migrants) over 5-year period
  • Negative values indicate net out-migration (displacement pressure)
  • Separate models for each income group (EL, VL, L), then combined

EER: Eviction Filing Rate Ratio

The eviction risk metric (final_ev) represents relative eviction risk:

  • Calculated as: tract eviction filing rate / state average rate
  • Log-transformed for modeling: log(rate ratio + 1)
  • Values > 1 indicate above-average eviction risk

Model Validation and Diagnostics

Quality assurance includes:

  1. Convergence diagnostics: Monitoring BART chain convergence
  2. Predictive performance: Evaluating R² and RMSE on held-out data
  3. Geographic validation: Ensuring predictions align with known displacement patterns
  4. Temporal stability: Comparing 2019 and 2022 results for consistency

Final Risk Categorization

Risk categories were determined through analysis of prediction distributions:

EDR Categories (based on net migration):

  • At Risk/Early Stage: ≥ 0 households
  • Elevated Risk: -100 to 0 households
  • High Risk: -300 to -100 households
  • Extreme Risk: < -300 households

EER Categories (based on filing rate ratio):

  • At Risk: 0.8 to 1.0
  • Elevated Risk: 1.0 to 1.5
  • High Risk: 1.5 to 2.0
  • Extreme Risk: ≥ 2.0

Data Quality Considerations

Tracts are excluded from risk assignment when:

  • Population below minimum thresholds (varies by region)
  • High student populations (>30% enrolled)
  • High retired populations (>40% receiving retirement income)
  • Military installations present
  • Insufficient Data Axle coverage

These exclusions ensure that risk assessments reflect genuine displacement pressures rather than demographic artifacts.

Limitations and Future Directions

While HPRM 2.0 represents a significant advance in measuring housing precarity, several limitations should be noted:

  • Data Axle coverage varies by region and may underrepresent certain populations
  • Eviction data availability limits EER to specific states
  • Models capture associations, not causal relationships
  • Five-year ACS estimates may lag current conditions

Future iterations will incorporate additional data sources, expand geographic coverage, and develop real-time updating capabilities.

About / Acknowledgements

Urban Displacement Project (UDP): HPRM 2.0 and this explainer were developed by the Urban Displacement Project (UDP). Learn more at https://www.urbandisplacement.org/

Eviction Research Network (ERN): The Estimated Eviction Risk (EER) component draws on eviction data and research produced by the Eviction Research Network (ERN). Learn more at https://evictionresearch.net/

When sharing or reusing figures, maps, or data from this page, please attribute both the Urban Displacement Project (UDP) and the Eviction Research Network (ERN).