Today, I'd like to focus on the question of how increased testing may be affecting the number of reported positive cases. Many thanks to a friend (you know who you are), who motivated this.
As previously noted, the daily number of cases has leveled off rather than decreasing as would be expected by the Gompertz curve. One possible explanation for this behavior is that because testing has increased, the number of positive cases has increased. For example, consider the following situation: Suppose on March 20 you test 100 people and find that 20% of those tests are positive, you have 20 new cases. Now on April 10, you test 200 people and you find another 20 new cases, which is equivalent to 10% of the tests. If you plot just the number of positives, you see a constant 20. But, if you plot the percent of people infected, you get a descending trend from 20% to 10%. The constant number of cases might lead you to believe that the infection rate has remained constant with time. Clearly, the more you test the more positive results you might expect. That does not mean the number of infections has increased. It just means you're doing a better job of finding them. This sampling bias can be corrected to get a consistent metric. It works in the same way that you adjust the cost of products for the cost of inflation over time. More on that in a moment.
The total number of tests conducted in the US is shown in the figure below. Note that this data is pulled from worldometer.com. I don't know how accurate this information is, but it's probably close enough for the analysis here. Testing increased slowly up through about April 10. Then it stabilized for about another ten days. But, beginning around April 20, it started to increase rapidly. It's now
well over 250,000 tests per day.
The percent of tests coming back as positive as a function of time is shown below. The percent of positive cases increased up through March 31 to a maximum of approximately about 25%. It's been slowly decreasing since, and now stands between 10% and 15%. So, roughly speaking, as testing increased the percent of positives decreased.
The number of positive cases is shown in the standard plots I've been presenting daily. It increased through time and then leveled off. There is also a strongly weekly oscillation imposed. Also superimposed on the daily cases is some imprint of testing numbers. Let's see if we can figure out roughly what that is.
I am going to pick April 1 as a reference date to which I will adjust all numbers. As I previously stated, this works just like inflation adjustment. If milk was $2/gal on April 1, and if inflation is 1% over 10 days, then I'd expect the milk to be $2.02 on April 11. If I find the milk actually costs $2.05 then I deduce that the cost of milk has increased by $0.03 after taking inflation into account.
On April 1, I find that there were 107,582 tests taken and 26,756 tests came back positive. I can then compare the number of tests taken on any other day, and adjust the results from that day to April 1. For example, on April 10 there 157,745 tests taken. Since there were more tests, I'd expect more positive results, and there were: 33,578. How many positives would I have expected if the same 107,582 tests taken on April 1 were administered on April 10? It's simple. I just multiply by 33,578 by the fraction 107,582/157,745. And the answer is 22,900. This is less than the number of positives recorded on April 1! So, even though the raw numbers have been increasing, we might have found that had testing capacity remained at the April 1 level the number of cases found on April 10 would have gone down. Testing capacity matters.
If I normalize all the data to the April 1 reference, I get the data shown in the "Normalized Testing" plot. The blue dots are the daily data. I've added a green line which is a 7-day moving average and helps to remove some of the noise so that the trend can be seen. I've added this same data to the usual daily case plot. Nevertheless, we can see that the number of cases normalized to the April 1 date has been decreasing since early April. If we were using the same testing capacity on April 1 throughout the pandemic, we'd likely be seeing fewer positives today.
What can we make of all this? I think the most important message is to be very, very wary of numbers that are bantered about by the media and especially by government officials who may wish to use biased or distorted numbers to their advantage. There are lots of subtleties in all these data. I've only scratched the surface of the veneer to reveal the truth behind the data. A deeper and far more detailed analysis is necessary to really understand what's going on. Had we been able to hit this pandemic with quality testing and reporting from the get-go, then the reality behind the numbers would be more transparent. But we didn't. So it isn't.
1 comment:
Excellent analysis, Scot!
Post a Comment