SOLVED! Mathematicians! Help solve coronavirus testing

While working on other coronavirus work, I read an article on Germany’s testing. They did a smart thing:

Medical staff, at particular risk of contracting and spreading the virus, are regularly tested. To streamline the procedure, some hospitals have started doing block tests, using the swabs of 10 employees, and following up with individual tests only if there is a positive result.

It was easy to miss, but it stuck in my head. The next morning: Eureka! By using one test instead of ten, they avoided nine tests. But math can make this better…

This is related to two other medium articles: here and here.

SOLVED UPDATE 4/17/2020

Turns out the math problem is already well studied and called “groupTesting” https://en.wikipedia.org/wiki/Group_testing#Non-adaptive_algorithms.

The basic algorithm (when the testing also tells you approximately how many people are infected in each pool/block is this:

|(1-p)^n| = 0.5
n=quasi-optimal block size.
p= probability of infection in the group.
Example, if you have 1000 people and the probability is 1%:
n= | 0.5/ln(0.99)|= 49.7 =~ 50
Optimal size is test in batches of 50.Then, later batches are just dividing by 2. So then batch size of 25, then 13, then 7, etc.See COMP and DI (?DD?) algorithms on the Group Testing wikipedia page.

The puzzle setup

Patients can be tested in pools/batches. The test is perfect (0% false negative, 0% false positive). A batch/pooled test over a set of patients will be positive for the virus if any individual patient is positive, and negative only if all are negative.

Each test has a cost. Batch/pooled tests and single tests are the same cost.

What is the optimal way to test all 500 patients? Optimal here means minimizing the number of tests (statistically speaking, the expectation of the number of tests).

Why this matters: re-opening schools

If we can batch test 500 students (and staff) and confirm they are not sick, then we can let all 500 go back to school.

This generalizes the problem and says: it’d be silly to not let all the students not go to school if only 1 kid is sick. We can just quarantine that 1 kid. So, how do we test to determine the 1 kid that is sick?

A simple solution, (not optimal)

The general solution space is sequential sets of pools/batches. One example is below.

In expectation, there are 5 students who will be sick. If we test in batches of 100, then we will might get all 5 to be positive, and that’s not so great. We might want to test smaller cells and get more that are negative.

So let’s test 10 batches of 50. All negatives flagged as definitely-not-infected. No further testing needed.

For each batch of size 50, let’s test 10 batches of 5. All negatives flagged as definitely-not-infected. No further testing needed.

For each batch of size 5, let’s test in size of 2. All negatives flagged as definitely-not-infected. No further testing needed.

This solution might be described as (50, 5, 2,1).

Now, one could do the very tedious probability math to compute the expected cost of this algorithm. But, I’m lazy and can let a computer run the monte carlo and it came out to 75.2 tests versus 500 tests. That’s a 85% reduction in the number of tests needed.

(for simplicity, I’ll just say “students” instead of students+staff all the time)

Generalizations

Generalization (3) is the toughest. The test has false negatives and false positives. These may be statistically “iid” or correlated. (Perfect correlation means that if you test me, you’ll get the same result. Maybe my virus is low/missing the RNA marker, so you’ll never get a positive result. “iid” means independent and identically distributed. One implication is that if the detection rate is 90%, testing twice makes it 99%, etc.).

In this case, it is impossible to detect everyone. You have to set a risk function and optimize by balancing cost and risk. This is hard to solve analytically, but it can be simulated easily.

Lastly, there are the logistical and microbiology challenges. Since this is for math people, I’ll just mention that this has to do with the Level of Detection (LOD). And it may not be true that you can’t test 100 people at once or, if you do, sometimes the batch/pooled test and individual tests would give different answers. You can think of this as a “batch/pooling error”.

When batching doesn’t make sense.

But for screening/surveillance, we definitely want to batch, like they do in Germany.

• Using monte-carlo, one can look at the gradient and make guesses, like a generalized optimization problem. That is, take the solution (50, 5, 2, 1), and then check (50, 6, 2, 1) and (50, 4, 2, 1) to see if either 6 or 4 improves it compared to 5.
• There is a recursive structure to the problem. The 500-size problem is split up into a 50-size problem (with different iid probabilities, related to the Monty Hall problem) and then a 5-size problem (recalculating the conditional probabilities). When the 5-size and 6-size problem is solved, you don’t have to re-solve those problems.
• This might have something to do with the counterfeit coin problem. See: https://arxiv.org/abs/1005.1391
• This might be related to hamming distance and coding theory (?)
• This might be related to covering designs in discrete math: https://www.dmgordon.org/cover/
• This might be related to something in operations research on defect detection. (?)

Nifty trick, but of limited use: When you are down to 5, you have a trick you can use. Test 2 (groupA). Then test another 2 (groupB). And then wait on the last one. If you rule out the first 4, you don’t have to run the last test. Logically, that last one has to be positive (otherwise the batch=5 test would have been negative). However, since there could be more than 1 positive in the batch of 5, positives in groupA or groupB don’t rule out the last singleton.

Javascript Code

var probability=0.01 // 1%
var initialsize=500 //
var waves=[50,5,2,1]
// var probability=0.10 // 10%
// var initialsize=50 //
// var waves=[2,1]
function processWaves(startvalue,endvalue,waveindex) {
console.log("processing " + startvalue + "start " + endvalue + "end " + waveindex + "waveindex");
let localTestCount=0;
// debugger;
if (startvalue==endvalue) {
// the batch size is 1
// console.log("size1");
return 1;
}
//check for any true
var someTrue= arrayOfInfection.slice(startvalue,endvalue+1).some((x)=> x==true);
if (someTrue) {
localTestCount++
// console.log("some true")
// debugger;
var waveSize=waves[waveindex];
var currentvalue=startvalue;
while( currentvalue<=endvalue ) {
localTestCount=localTestCount+processWaves(currentvalue,Math.min(currentvalue+waveSize-1),waveindex+1);
currentvalue=currentvalue+waveSize;
}
return localTestCount;
} else {
// console.log("all negative")
return 1; // localTestCount
}
}
var arrayOfResults=[];
var repetitions=500;
for (let i=0; i<repetitions; i++) {
console.log("iteration " + i );
var arrayOfRandom = Array.from({length: initialsize}, () => Math.random());
var arrayOfInfection = arrayOfRandom.map( (x) => (x<probability) );
var MyResult= processWaves(0,initialsize-1,0);
console.log("MAIN RESULT");
console.log(MyResult);
arrayOfResults.push(MyResult);
}
console.log("Average over repetitions " + repetitions);
var Average= arrayOfResults.reduce((a,b) => a + b, 0) / arrayOfResults.length
console.log(Average)

Link to the javascript on Github.