Since testing for COVID-19 has been so limited, I’ve been thinking about what kind of analysis is even appropriate when it comes to the data. I might a small part of the solution.
Reporting cumulative cases alone isn’t good enough because it doesn’t account for new cases reported each day. Reporting new cases isn’t good enough because it doesn’t account for how many tests it took to get those cases. The problems continue.
My thinking: Because testing is reserved for people who are at a higher health risk and those who think they’ve been exposed, we should expect that testing appears to be fairly accurate, i.e. it’s common to test someone who has COVID-19.
Therefore, it might be useful to keep track of trends in testing and cases together. We can keep up with the live average of tests administered per positive case, then use that to determine how many new cases we should expect to see each day based on how many tests were reported. Finally, you can compare that number to how many new cases were actually reported.
I’m not making value judgments about the trends. If the actual number of new cases is much greater than what we would expect, for example, the analysis could be that the virus is spreading more rapidly and/or that we’re getting better at identifying who’s been infected.
Note: “Tests/positive case” is on a seven-day delay in order to account for the time it takes for a positive test result to come back and show up in the data. “Tests/positive case (average)” is on a seven-day rolling basis. The state health department began consistently reporting testing numbers March 21, but because the data for that day appears to be an accumulation of prior days, I didn’t include it to find expected case numbers and differences.
Another note: “Tests/positive case” should actually be attributed to the date seven days before the row it appears on, since it’s on a delay. For example, 2.34 tests/positive case is on the row for March 29 but actually applies to March 22. I couldn’t think of a clean way to display the data.