# SOLVED! Mathematicians! Help solve coronavirus testing

While working on other coronavirus work, I read an article on Germany’s testing. They did a smart thing:

` Medical staff, at particular risk of contracting and spreading the virus, are regularly tested. To streamline the procedure, some hospitals have started doing block tests, using the swabs of 10 employees, and following up with individual tests only if there is a positive result.`

It was easy to miss, but it stuck in my head. The next morning: Eureka! By using one test instead of ten, they avoided nine tests. But math can make this better…

This is related to two other medium articles: here and here.

# SOLVED UPDATE 4/17/2020

Turns out biotx.ai is working on this problem (as a Covid19 not-for-profit solutions) and has a 2 month head start. See https://www.pcr-pooling.com/presentation.

Turns out the math problem is already well studied and called “groupTesting” https://en.wikipedia.org/wiki/Group_testing#Non-adaptive_algorithms.

The basic algorithm (when the testing also tells you approximately how many people are infected in each pool/block is this:

|(1-p)^n| = 0.5

n=quasi-optimal block size.

p= probability of infection in the group.Example, if you have 1000 people and the probability is 1%:

n= | 0.5/ln(0.99)|= 49.7 =~ 50Optimal size is test in batches of 50.Then, later batches are just dividing by 2. So then batch size of 25, then 13, then 7, etc.See COMP and DI (?DD?) algorithms on the Group Testing wikipedia page.

## So mathematicians and puzzle solvers:

Here is how you can help solve coronavirus testing. Solve this optimization (or give an algorithm that gets close to an optimal solution).

# The puzzle setup

There are 500 patients, each with a 1% chance (iid) of having the coronavirus.

Patients can be tested in pools/batches. The test is perfect (0% false negative, 0% false positive). A batch/pooled test over a set of patients will be positive for the virus if any individual patient is positive, and negative only if all are negative.

Each test has a cost. Batch/pooled tests and single tests are the same cost.

What is the optimal way to test all 500 patients? Optimal here means minimizing the number of tests (statistically speaking, the expectation of the number of tests).

# Why this matters: re-opening schools

We highlight this in a separate medium article, but it’s worth repeating here:

If we can batch test 500 students (and staff) and confirm they are not sick, then we can let all 500 go back to school.

This generalizes the problem and says: it’d be silly to not let all the students not go to school if only 1 kid is sick. We can just quarantine that 1 kid. So, how do we test to determine the 1 kid that is sick?

**A simple solution, (not optimal)**

We could always test all 500. That’s a reference case. Cost is 500 tests.

The general solution space is sequential sets of pools/batches. One example is below.

In expectation, there are 5 students who will be sick. If we test in batches of 100, then we will might get all 5 to be positive, and that’s not so great. We might want to test smaller cells and get more that are negative.

So let’s test 10 batches of 50. All negatives flagged as definitely-not-infected. No further testing needed.

For each batch of size 50, let’s test 10 batches of 5. All negatives flagged as definitely-not-infected. No further testing needed.

For each batch of size 5, let’s test in size of 2. All negatives flagged as definitely-not-infected. No further testing needed.

This solution might be described as (50, 5, 2,1).

Now, one could do the very tedious probability math to compute the expected cost of this algorithm. But, I’m lazy and can let a computer run the monte carlo and it came out to 75.2 tests versus 500 tests. That’s a 85% reduction in the number of tests needed.

(for simplicity, I’ll just say “students” instead of students+staff all the time)

**Generalizations**

The key generalizations are (1) the number of students and (2) the probability. Maybe (2a) would be that the probability may not be iid or that they are independent, but some are high probability and some are low.

Generalization (3) is the toughest. The test has false negatives and false positives. These may be statistically “iid” or correlated. (Perfect correlation means that if you test me, you’ll get the same result. Maybe my virus is low/missing the RNA marker, so you’ll never get a positive result. “iid” means independent and identically distributed. One implication is that if the detection rate is 90%, testing twice makes it 99%, etc.).

In this case, it is impossible to detect everyone. You have to set a risk function and optimize by balancing cost and risk. This is hard to solve analytically, but it can be simulated easily.

Lastly, there are the logistical and microbiology challenges. Since this is for math people, I’ll just mention that this has to do with the Level of Detection (LOD). And it may not be true that you can’t test 100 people at once or, if you do, sometimes the batch/pooled test and individual tests would give different answers. You can think of this as a “batch/pooling error”.

## When batching doesn’t make sense.

There is another case of importance. In hospital ERs, high risk patients are being tested to confirm Covid19. In these cases, the likelihood of a positive is maybe 50%. In this case, the best answer is going to be to just test everyone individually. The sketch of the mathematical proof: If you batch_size=2, there is a 25% chance you get two negatives and you save 1 test. But, there is a 75% chance you get one or more positives, so then you still used an extra test. In expectation, it’s more tests. Therefore, in hospital/clinical settings, individual tests make the most sense.

But for screening/surveillance, we definitely want to batch, like they do in Germany.

# Mathematicians, ready your pencils and laptops

If you can solve this algorithm, then you might help speed recovery from economic disaster.

Hints/Leads:

- Using monte-carlo, one can look at the gradient and make guesses, like a generalized optimization problem. That is, take the solution (50, 5, 2, 1), and then check (50, 6, 2, 1) and (50, 4, 2, 1) to see if either 6 or 4 improves it compared to 5.
- There is a recursive structure to the problem. The 500-size problem is split up into a 50-size problem (with different iid probabilities, related to the Monty Hall problem) and then a 5-size problem (recalculating the conditional probabilities). When the 5-size and 6-size problem is solved, you don’t have to re-solve those problems.
- This might have something to do with the counterfeit coin problem. See: https://arxiv.org/abs/1005.1391
- This might be related to hamming distance and coding theory (?)
- This might be related to covering designs in discrete math: https://www.dmgordon.org/cover/
- This might be related to something in operations research on defect detection. (?)

Nifty trick, but of limited use: When you are down to 5, you have a trick you can use. Test 2 (groupA). Then test another 2 (groupB). And then wait on the last one. If you rule out the first 4, you don’t have to run the last test. Logically, that last one has to be positive (otherwise the batch=5 test would have been negative). However, since there could be more than 1 positive in the batch of 5, positives in groupA or groupB don’t rule out the last singleton.

# Javascript Code

Here is the javascript code:

var probability=0.01 // 1%

var initialsize=500 //

var waves=[50,5,2,1]// var probability=0.10 // 10%

// var initialsize=50 //

// var waves=[2,1]function processWaves(startvalue,endvalue,waveindex) {

console.log("processing " + startvalue + "start " + endvalue + "end " + waveindex + "waveindex");

let localTestCount=0;

// debugger;

if (startvalue==endvalue) {

// the batch size is 1

// console.log("size1");

return 1;

}//check for any true

var someTrue= arrayOfInfection.slice(startvalue,endvalue+1).some((x)=> x==true);if (someTrue) {

localTestCount++

// console.log("some true")

// debugger;

var waveSize=waves[waveindex];

var currentvalue=startvalue;

while( currentvalue<=endvalue ) {

localTestCount=localTestCount+processWaves(currentvalue,Math.min(currentvalue+waveSize-1),waveindex+1);

currentvalue=currentvalue+waveSize;

}

return localTestCount;

} else {

// console.log("all negative")

return 1; // localTestCount

}

}var arrayOfResults=[];

var repetitions=500;for (let i=0; i<repetitions; i++) {

console.log("iteration " + i );

var arrayOfRandom = Array.from({length: initialsize}, () => Math.random());

var arrayOfInfection = arrayOfRandom.map( (x) => (x<probability) );var MyResult= processWaves(0,initialsize-1,0);

console.log("MAIN RESULT");

console.log(MyResult);

arrayOfResults.push(MyResult);

}

console.log("Average over repetitions " + repetitions);

var Average= arrayOfResults.reduce((a,b) => a + b, 0) / arrayOfResults.length

console.log(Average)

Link to the javascript on Github.