COVID19 data statistical adjustment for SIR model and estimation
All of us are coping with the current COVID19 crisis. I hope that all of you stay safe and that this situation will end as soon as possible.
For this sad situation and for my unstoppable curiosity, I've started to read something about the SIR model. The variables of such model are s (the fraction of people susceptible to infection), y (the fraction of infected people) and r (the fraction of recovered people + the sad statistics of deaths). The model reads as:
where and are positive parameters. One strong hypothesis of this model is that the population size is constant over time (deaths are assumed to be recovered, births are neglected since, hopefully, they will be the part of the population which for sure will be protected from the disease). The initial conditions are set such that and , and . Under this assumption, it can be proven that .
The news often talk about the coefficient:
which rules the behavior of the system (for the disease will be wiped out, for it will spread out).
The same news also talk about the estimation of such parameter. Well, given the time series of s, y and r, it is rather easy to estimate the parameters and , and hence . My main concern is about the time series. For each country we know the daily count of infected people (let's say Y(t)), of recovered (or dead) people (let's say R(t)).
Anyway, there are several infected people which are not recorded (let's say Y′(t)), and many of them get recovered without knowing that they have been infected (let's say R′(t))! Moreover, day after day, the number of tests on people is increasing.
If we indicate with N the (constant) size of population, we get that:
Here is the question(s). How can we perform the estimation of and if we don't know the unobserved variables Y′(t) and R′(t)? How do the experts of the field estimate and even though the available data are not complete? Do they use some data adjustment?