Housing Forecast Methodologies
Comparison Communities Methodology
Overview
The "How does this place compare to similar communities?" card surfaces a focus municipality alongside ten peer communities. The peer set is precomputed by an offline pipeline that combines substantive feature similarity (demographics, housing, employment) with geographic proximity, so the chosen peers are not just statistical lookalikes — they tend to share regional context as well.

The peer-set table (app.municipality_peer_sets) is keyed by (experiment_id, geoid). The card reads the active experiment, takes the first ten entries from default_peer_geoids, and renders them next to the focus geo.
What goes into the comparison
Each municipality is described by a vector of features pulled from public data:
| Domain | Source | Examples |
|---|---|---|
| Population & age structure | ACS B01001, B09021 | Total population, share 65+, share under 18 |
| Income | ACS B19013, B19001 | Median household income, income-bracket shares |
| Education | ACS B15003 | Share with bachelor's degree or higher |
| Household composition | ACS B11005, B11012 | Household size, share with children, single-person households |
| Housing stock | ACS B25001, B25024, B25032, B25041 | Total units, units in structure, bedrooms, unit age |
| Tenure & vacancy | ACS B25002, B25003 | Owner share, vacancy rate |
| Cost burden | ACS B25074, B25090, B25095 | Renter and owner cost-to-income shares |
| Home values | Zillow (via app.housing_affordability) | Typical home value, value-to-income ratio, 5-yr and 10-yr price change |
| Jobs | LODES WAC (via app.job_growth) | Total jobs, 5-yr and 10-yr job growth |
For each feature we also compute trajectories: the latest value, the five-year change, the ten-year change, and a slope. This means two communities that look identical today but have moved very differently over the last decade will be treated as less similar than their snapshot suggests.
Features that are missing for more than half of municipalities nationwide are dropped. Remaining features are standardized (z-scored) so no single high-magnitude metric — typically median home value — dominates the distance.
How peers are chosen
The selection runs in four stages:
1. Geographic graph
A graph is built over every municipality in the country. Each municipality is connected to:
- its 10 nearest municipalities by straight-line distance (capped at 200 mi),
- the 10 nearest municipalities in the same county (capped at 150 mi),
- the 15 nearest municipalities in the same CBSA (capped at 250 mi), and
- a 5-neighbor fallback within the same state (capped at 300 mi).
This graph encodes "what counts as nearby" in a way that respects metro and state lines rather than raw Euclidean distance.
2. Geographic embedding
The graph is embedded into a 32-dimensional vector space using Node2Vec (with random walks of length 40, ten walks per node, a window of 10, and a fixed random seed for reproducibility). The resulting vectors capture multi-hop neighborhood structure — two communities that share many regional neighbors end up close in embedding space even if they are not direct graph neighbors.
A spectral graph embedding is used as a fallback if Node2Vec is unavailable.
3. Candidate ranking
For each focus municipality we:
Pull the top 100 substantive candidates by L2 distance in the standardized feature space.
Pull the top ~30 geographic neighbors from the embedding.
Combine the two pools and rerank by
distance = min-max(substantive distance) − wlocality × log(proximity)
where
proximityis similarity in the geographic embedding and wlocality is an adaptive locality weight between 0.05 and 0.45. The weight scales with the source municipality's population: smaller communities lean more on regional peers, since their substantive feature vectors are noisier; large cities lean more on substantive similarity, since their nearest geographic neighbors are often unlike them in scale.Sort by a low-confidence flag (peers we are less sure of get pushed to the back) and then by the combined distance.
Keep the top 20 candidates per municipality. The first 10 form the default peer set shown on the dashboard.
4. Regional groupings
In addition to the default set, the pipeline records four overlapping segments for each focus municipality:
- Default — top 10 by combined distance (the set shown on the card).
- Nearby — peers in the same county or CBSA, or within 250 mi in the same state.
- Outside CBSA — peers from a different metro area.
- Outside state — peers from a different state.
These segments are not surfaced on the public dashboard today, but they let internal analyses ask "what would this list look like if we forced peers to be regional / non-regional?"
Output table
The card reads from app.municipality_peer_sets, which is a SQL view backed by the underlying municipality_peer_candidates table. One row per (experiment_id, geoid).
| Column | Type | Notes |
|---|---|---|
experiment_id | text | Hash-based id encoding the feature set, parameters, and seed used to build the table. Changes whenever the pipeline reruns with different settings. |
geoid | text | Focus municipality GEOID. |
default_peer_geoids | text[] | Top 10 peers, ordered. |
default_peer_names | text[] | Place names matching default_peer_geoids. |
candidate_peer_geoids | text[] | Full top 20, ordered. |
nearby_peer_geoids | text[] | Subset within the same metro / nearby. |
outside_cbsa_peer_geoids | text[] | Subset from a different metro. |
outside_state_peer_geoids | text[] | Subset from a different state. |
*_count | int | Cardinality of each list. |
created_at | timestamp | Run audit. |
What the card itself shows
Once the peer set is loaded, the card pulls a small set of ACS series for the focus + ten peers and renders two parallel line charts plus an expandable table:
- Average household size (ACS B25010_001), 2010 through the latest ACS release.
- Median household income, inflation-adjusted (ACS B19013_001 ×
bls.inflation_data.ratio) in the latest year's dollars. - The expandable section adds current household count, 10-year household growth, share of 65+ households (B11007), median income, and vacancy rate (B25002).
The card is municipality-only — app.municipality_peer_sets does not contain entries for counties or other layers — so it is hidden on county and state dashboards.
Versioning
The experiment_id constant is encoded in the card's data module (cards/charts/comparison_communities/_peer_set.py). When the peer-set pipeline is rerun with new parameters or features, the id changes and the card switches to the new set at the next deploy. Old experiment_id rows are not deleted from the table, so historical peer sets remain queryable for backfills and analysis.
Limitations
- Coverage is municipality-only. Census Designated Places, county subdivisions in non-strong-MCD states, and unincorporated areas all map to "municipality" in CommunityScale's terminology, but each ACS table covers them slightly differently. A peer in a CDP-heavy state and a peer in a strong-MCD state are not perfectly comparable.
- Feature noise grows with smallness. Sub-5,000-population municipalities have noisy ACS estimates. The adaptive locality weight partially offsets this — small communities lean on regional peers — but peers for very small municipalities should be read as suggestive rather than authoritative.
- No causal claim. Peers are similar on the dimensions we measure. Two peer communities can still have very different policy environments, school systems, infrastructure histories, and so on. The card is a starting point for comparison, not a substitute for local knowledge.