“I spent some hours correlating the census tract data against the ‘neighborhood’ COVID map”

interactive map also embedded below

Huge thanks to Molly for sharing some perspective on the District’s neighborhoods coronavirus map:

“I’m following up about providing some more details and explanation about my DC COVID map that’s weighted by population. There had been some discussion about wanting to see a version of the map that controlled for population in the comments on your daily data posts this week, and I was really curious so I spent some hours correlating the census tract data against the ‘neighborhood’ COVID map that the city started publishing earlier this week. I would say I’m semi-professional when it comes to things like this–as in, I have professional training but it’s not what I usually get paid to do these days. So I’d call this a citizen data science effort!

The city has been reporting coronavirus data by Ward throughout the crisis. They recently added a report for positive cases by “neighborhood”. The city’s neighborhoods are made up of groupings of DC census tracts (standardized geographic areas) so may not align with what we commonly consider neighborhood boundaries. If you want to look at the city’s neighborhood classifications more closely, zoom in on this pdf map.

The city reports the total number of positive cases per neighborhood in their daily reports. However, some neighborhoods have a much higher population than others. If 200 people are sick, it’s important to know whether it’s 200 out of 2,000, vs. 200 out of 20,000. To provide this perspective, I created a map graphic that can show us the rate of cases in each neighborhood, with an interactive map here:

This way isn’t necessarily “better” than the city’s way of showing it, but I personally find it helpful to understand the extent of the outbreak relative to population.

How I did it: The 2018 American Community Survey (ACS) provides census-tract level data for population and some demographic characteristics like age and income. Using data available on Open Data DC, I matched all of the census tracts to the city’s COVID neighborhoods to determine the population of each “neighborhood,” which then made it possible to calculate the rate of cases per neighborhood. I report this statistic as positive cases per thousand residents.

Why it matters: Consider an example: Tenleytown (neighborhood N44) and Shepherd Park (N40) had a similar number of positives in May 9 data, at 112 and 114 respectively. As such, they are the same color on the city’s map. But the population of Tenleytown (18,099) is more than twice that of Shepherd Park (8,696). So proportionally, someone in Shepherd Park (13.1 cases per thousand) is twice as likely to have been sick as someone in Tenleytown (6.2 cases per thousand). There are a number of “neighborhoods” for which this is the case–similar total number of cases but different populations, meaning a different rate of illness.

One of my first reactions when I crunched the numbers and saw the rate map–and I bet I’m not alone in this: “What’s going on in ‘Stadium Armory’?!” The map reveals that the rate of cases in Stadium Armory (65.2 cases per thousand on May 9) is roughly triple that of the next hardest-hit neighborhood, and more than seven times the median neighborhood rate (8.9 cases per thousand). On the city’s map, Stadium Armory does not particularly stand out, because its total number of cases for May 9 (173) was not an outlier among neighborhoods. The two versions of the map tell very different stories about what is happening in that neighborhood.

The DC Jail is located within Stadium Armory, which I suspect is driving the high rate of positive cases. But my map doesn’t tell you why the rate of cases is so much higher there than anywhere else. It could (just as in any neighborhood) be a reflection of the level of testing as much as the level of illness, or it could be any number of other factors.

Analyzing the data by population has made this outlier and other facts visible. Now we are able to ask questions about “why” that we might not have otherwise realized that we needed to ask. We also have another level of information that can help to assess individual and collective risk. Good data visualization informs decision-making, and I hope that my effort here can contribute to the COVID conversation in DC.

I’ve created a spreadsheet that I hope to keep updated regularly with new data and graphics. I’m open to any feedback or collaboration with other dataviz folks out there. Anyone is welcome to download a copy of the spreadsheet and do their own work with it. I’m not a magician and I’m sure others can do more advanced things.

Thank you essential workers! Everyone else stay home! I live in a basement and want to see light again someday.”