COVID19 data statistical adjustment for SIR model and estimation

vidamuhae

vidamuhae

Answered question

2022-11-14

COVID19 data statistical adjustment for SIR model and estimation
All of us are coping with the current COVID19 crisis. I hope that all of you stay safe and that this situation will end as soon as possible.
For this sad situation and for my unstoppable curiosity, I've started to read something about the SIR model. The variables of such model are s (the fraction of people susceptible to infection), y (the fraction of infected people) and r (the fraction of recovered people + the sad statistics of deaths). The model reads as:
{ s ˙ = β s y y ˙ = β s y γ y r ˙ = γ y ,
where β and γ are positive parameters. One strong hypothesis of this model is that the population size is constant over time (deaths are assumed to be recovered, births are neglected since, hopefully, they will be the part of the population which for sure will be protected from the disease). The initial conditions are set such that s ( 0 ) + y ( 0 ) + r ( 0 ) = 1 and s ( 0 ) 0, y ( 0 ) 0 and r ( 0 ) 0. Under this assumption, it can be proven that s ( t ) + y ( t ) + r ( t ) = 1   t > 0.
The news often talk about the coefficient:
R 0 = β γ ,
which rules the behavior of the system (for R 0 < 1 the disease will be wiped out, for R 0 > 1 it will spread out).
The same news also talk about the estimation of such parameter. Well, given the time series of s, y and r, it is rather easy to estimate the parameters β and γ, and hence R 0 . My main concern is about the time series. For each country we know the daily count of infected people (let's say Y(t)), of recovered (or dead) people (let's say R(t)).
Anyway, there are several infected people which are not recorded (let's say Y′(t)), and many of them get recovered without knowing that they have been infected (let's say R′(t))! Moreover, day after day, the number of tests on people is increasing.
If we indicate with N the (constant) size of population, we get that:
y ( t ) = Y ( t ) + Y ( t ) N , r ( t ) = R ( t ) + R ( t ) N   and   s ( t ) = 1 y ( t ) r ( t ) .
Here is the question(s). How can we perform the estimation of β and γ if we don't know the unobserved variables Y′(t) and R′(t)? How do the experts of the field estimate β and γ even though the available data are not complete? Do they use some data adjustment?

Answer & Explanation

Justin Blake

Justin Blake

Beginner2022-11-15Added 11 answers

Step 1
Unfortunately we don't have accurate numbers for R and Y in any large population. Most of the deaths may be recorded (although there may be a substantial number of deaths that are not attributed to Covid-19 because the symptoms are not typical), but large numbers of people have very mild symptoms, going from S to Y and into R without ever being tested.
From the point of view of getting accurate statistics, it would be desirable to take a random sample of the population and test them at frequent intervals. But as far as I know this has not been done anywhere.
Of course there are all sorts of complications. Rather than a homogeneous population, there are lots of subpopulations that have different parameters, and varying amounts of interactions between them. For example, residents of long-term care homes are an important subpopulation, the one that's producing a very large fraction of the deaths.
Step 2
So if s j , i j , r j are the numbers of susceptible, infective and removed in subpopulation j, you should have
s ˙ j = k β j k s j i k i ˙ j = k β j k s j i k γ j i j r ˙ j = γ j i j
However, increasing the number of subpopulations increases the number of parameters, making parameter estimation even more of a nightmare.

Do you have a similar question?

Recalculate according to your conditions!

New Questions in College Statistics

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?