Density, Covid, and the Modifiable Areal Unit Problem

The statistical association (bracketing causal inference at this point) between people living near one another (population density) and the rate of COVID-19 cases seems to vary between different levels of aggregation (e.g. neighborhood versus city versus state). This is called the Modifiable Areal Unit Problem which arises from aggregating point data (in context, individual cases of COVID-19) into geographic boundaries. Relationships at one level do not always hold at other aggregate levels, or at the individual level (what’s called the ecological fallacy).

For example, New York City, the densest city in the United States, has suffered tremendously from COVID-19 — more so than the vast majority of the country, which is less dense than it. But within the city of New York, much less dense parts like Staten Island and the Outer Bronx  are hurting much more than the densest part — Manhattan. One important point made in these discussions by density apologists (like me) is the distinction between overcrowding (many individuals, or families, living in a single house or apartment) and density (many people living in the same geographic area). Economic deprivation and its impact on health is another important factor.

But there remain important discussions about how other things related to density, like different modes of transit, might impact infectious disease spread (perhaps especially in the US, for some reasons — very dense places with very heavy public transit usage in Asia, like Seoul or Hong Kong, seem to have weathered COVID more successfully, as has Germany).

Between municipalities in Massachusetts, there is a pretty substantial relationship between population density and COVID case rates. The following plots are subset to cities and towns with 5 or more cases (293 of the 351 total municipalities) because of how the DPH censors places with low case counts.

density_mamunis_may8_nologPoor, blue-collar,  dense, and heavily Latino Chelsea is the top right point, and majority-Black Brockton is the second highest point (around 5000 people per square mile). Somerville and Cambridge, the two densest cities in the state, have around 15% the case incidence as Chelsea, the third densest. Both these variables are right skewed, and here’s the same plot using logarithms:

density_mamunis_may8This relationship holds net of income and with county fixed effects, but of course there are other confounders. County fixed effects — which deal with the weirdness of the state’s island counties comprising Nantucket and Martha’s Vineyard, and possibly control for some other unobservables — weaken the association with density somewhat and boost the protective association with income.

Municipal association between density/income and COVID case rates (May 8)
Dependent variable:
log(Covid Rate per 10,000)
(1) (2)
log(Population Density) 0.347*** 0.290***
(0.025) (0.033)
log(Income per Capita) -0.217*** -0.273***
(0.063) (0.075)
Observations 293 293
County Fixed Effects No Yes
Adjusted R2 0.394 0.407
Residual Std. Error 0.532 (df = 290) 0.526 (df = 277)
Note: *p<0.1; **p<0.05; ***p<0.01

However, at the neighborhood level around Boston, this relationship doesn’t hold. Instead, density is inversely related with case-incidence, which I imagine is mostly owing to socioeconomic and racial confounders, and perhaps age (I also wonder how reliably population density statistics capture places like Allston-Brighton and Fenway with large college populations when many college students have left). As of May 8th:


A different relationship holds at different levels of analysis. At the municipal level, density is associated with more COVID cases, but at the neighborhood level it is not. It would be good to have a direct measure of household-level overcrowding, among other things to really know what’s going on. Plus there’s a temporal dimension: big cities with more global connectivity may be further along their pandemic “curve” than more remote places. But the point I’m making here is that one shouldn’t assume relationships hold across different aggregate levels of analysis.

Data sources: MA DPH , MA Department of Revenue, Boston Public Health Commission, Brookline DPH, Weighted averages of zipcode population densities (e.g. here) for Boston neighborhoods.