Comparison Communities Methodology

Overview

The "How does this place compare to similar communities?" card surfaces a focus municipality alongside ten peer communities. The peer set is precomputed by an offline pipeline that combines substantive feature similarity (demographics, housing, employment) with geographic proximity, so the chosen peers are not just statistical lookalikes — they tend to share regional context as well.

Comparison-communities peer graph — top 2,000 U.S. municipalities by population — Force-directed layout of the top 2,000 U.S. municipalities. Edges connect cities that name each other as default peers; nodes are sized by population and colored by census region. The U.S. silhouette emerges because the layout is seeded by each municipality's centroid.

The peer-set table (app.municipality_peer_sets) is keyed by (experiment_id, geoid). The card reads the active experiment, takes the first ten entries from default_peer_geoids, and renders them next to the focus geo.

What goes into the comparison

Each municipality is described by a vector of features pulled from public data:

Domain	Source	Examples
Population & age structure	ACS B01001, B09021	Total population, share 65+, share under 18
Income	ACS B19013, B19001	Median household income, income-bracket shares
Education	ACS B15003	Share with bachelor's degree or higher
Household composition	ACS B11005, B11012	Household size, share with children, single-person households
Housing stock	ACS B25001, B25024, B25032, B25041	Total units, units in structure, bedrooms, unit age
Tenure & vacancy	ACS B25002, B25003	Owner share, vacancy rate
Cost burden	ACS B25074, B25090, B25095	Renter and owner cost-to-income shares
Home values	Zillow (via `app.housing_affordability`)	Typical home value, value-to-income ratio, 5-yr and 10-yr price change
Jobs	LODES WAC (via `app.job_growth`)	Total jobs, 5-yr and 10-yr job growth

For each feature we also compute trajectories: the latest value, the five-year change, the ten-year change, and a slope. This means two communities that look identical today but have moved very differently over the last decade will be treated as less similar than their snapshot suggests.

Features that are missing for more than half of municipalities nationwide are dropped. Remaining features are standardized (z-scored) so no single high-magnitude metric — typically median home value — dominates the distance.

How peers are chosen

The selection runs in four stages:

1. Geographic graph

A graph is built over every municipality in the country. Each municipality is connected to:

its 10 nearest municipalities by straight-line distance (capped at 200 mi),
the 10 nearest municipalities in the same county (capped at 150 mi),
the 15 nearest municipalities in the same CBSA (capped at 250 mi), and
a 5-neighbor fallback within the same state (capped at 300 mi).

This graph encodes "what counts as nearby" in a way that respects metro and state lines rather than raw Euclidean distance.

2. Geographic embedding

The graph is embedded into a 32-dimensional vector space using Node2Vec (with random walks of length 40, ten walks per node, a window of 10, and a fixed random seed for reproducibility). The resulting vectors capture multi-hop neighborhood structure — two communities that share many regional neighbors end up close in embedding space even if they are not direct graph neighbors.

A spectral graph embedding is used as a fallback if Node2Vec is unavailable.

3. Candidate ranking

For each focus municipality we:

Pull the top 100 substantive candidates by L2 distance in the standardized feature space.
Pull the top ~30 geographic neighbors from the embedding.
Combine the two pools and rerank by
distance = min-max(substantive distance) − w_locality × log(proximity)
where proximity is similarity in the geographic embedding and w_locality is an adaptive locality weight between 0.05 and 0.45. The weight scales with the source municipality's population: smaller communities lean more on regional peers, since their substantive feature vectors are noisier; large cities lean more on substantive similarity, since their nearest geographic neighbors are often unlike them in scale.
Sort by a low-confidence flag (peers we are less sure of get pushed to the back) and then by the combined distance.
Keep the top 20 candidates per municipality. The first 10 form the default peer set shown on the dashboard.

4. Regional groupings

In addition to the default set, the pipeline records four overlapping segments for each focus municipality:

Default — top 10 by combined distance (the set shown on the card).
Nearby — peers in the same county or CBSA, or within 250 mi in the same state.
Outside CBSA — peers from a different metro area.
Outside state — peers from a different state.

These segments are not surfaced on the public dashboard today, but they let internal analyses ask "what would this list look like if we forced peers to be regional / non-regional?"

Output table

The card reads from app.municipality_peer_sets, which is a SQL view backed by the underlying municipality_peer_candidates table. One row per (experiment_id, geoid).

Column	Type	Notes
`experiment_id`	text	Hash-based id encoding the feature set, parameters, and seed used to build the table. Changes whenever the pipeline reruns with different settings.
`geoid`	text	Focus municipality GEOID.
`default_peer_geoids`	text[]	Top 10 peers, ordered.
`default_peer_names`	text[]	Place names matching `default_peer_geoids`.
`candidate_peer_geoids`	text[]	Full top 20, ordered.
`nearby_peer_geoids`	text[]	Subset within the same metro / nearby.
`outside_cbsa_peer_geoids`	text[]	Subset from a different metro.
`outside_state_peer_geoids`	text[]	Subset from a different state.
`*_count`	int	Cardinality of each list.
`created_at`	timestamp	Run audit.

What the card itself shows

Once the peer set is loaded, the card pulls a small set of ACS series for the focus + ten peers and renders two parallel line charts plus an expandable table:

Average household size (ACS B25010_001), 2010 through the latest ACS release.
Median household income, inflation-adjusted (ACS B19013_001 × bls.inflation_data.ratio) in the latest year's dollars.
The expandable section adds current household count, 10-year household growth, share of 65+ households (B11007), median income, and vacancy rate (B25002).

The card is municipality-only — app.municipality_peer_sets does not contain entries for counties or other layers — so it is hidden on county and state dashboards.

Versioning

The experiment_id constant is encoded in the card's data module (cards/charts/comparison_communities/_peer_set.py). When the peer-set pipeline is rerun with new parameters or features, the id changes and the card switches to the new set at the next deploy. Old experiment_id rows are not deleted from the table, so historical peer sets remain queryable for backfills and analysis.

Limitations

Coverage is municipality-only. Census Designated Places, county subdivisions in non-strong-MCD states, and unincorporated areas all map to "municipality" in CommunityScale's terminology, but each ACS table covers them slightly differently. A peer in a CDP-heavy state and a peer in a strong-MCD state are not perfectly comparable.
Feature noise grows with smallness. Sub-5,000-population municipalities have noisy ACS estimates. The adaptive locality weight partially offsets this — small communities lean on regional peers — but peers for very small municipalities should be read as suggestive rather than authoritative.
No causal claim. Peers are similar on the dimensions we measure. Two peer communities can still have very different policy environments, school systems, infrastructure histories, and so on. The card is a starting point for comparison, not a substitute for local knowledge.