Solving the SIR Virus Model…

Due to the current covid-19 outbreak there has been a fair bit of interest in virus modelling lately. The models that the media have published/produced vary from the over simplified (and often incorrect) to the complex (with many missing details). I have already looked at numeric simulations using a bouncing ball model, this time a more analytic technique using differential equations will be used.

The oversimplified models predict that the number infected follow a Gaussian distribution. This has been stated by some fairly official sources (such as the flatten the curve group). While in certain cases it is possible to have a Gaussian, however in most cases it isn’t. Usually, the infection curve is an asymmetric Gaussian with a tail that can rather extend as time increases.

More complex models do exist (such as those reported by the CDC), but a lot of parameters/assumptions are missing and can be found by looking deep on the websites of various research groups or from different published papers.

In this project, the SIR model will be explored to gain some knowledge with virus modelling. Also, a web-based tool will be developed to allow a user to interact with the model.

Compartmental Models

There are many ways to mathematically model how a virus propagates through a population. Compartmental modelling is a very common technique where the various states that are possible exist as separate compartments (or non-overlapping groups), along with a list of rules of how members move between them. The rules are usually a set of differential equations based on the number of members of a particular compartment and different model parameters. Model parameters are values (such as rates) used by the model that are usually empirically measured or based on historical results. Wikipedia has a nice page about different introductory compartmental models used in epidemiology.

The SIR Model

The SIR model is based on three compartments, namely susceptible S, infected I, and recovered (or removed) R. The model predicts how a virus infects members from the susceptible population, how long they stay infected, and when they become recovered. Below is a diagram of block diagram of SIR model.

From the diagram two model parameters are used:

  • β is the infection rate, or how often a susceptible individual is in contact with an infected member results in a new infection.
  • γ is the recovery rate, or how quickly an infected individual recovers

Some key points:

  • It is a closed model, that is the total number of individuals is constant. Therefore, no new individuals are introduced (through births or immigration), or removed (via death and immigration).
  • The model only moves in one direction, that is forward. An infected or recovered individual cannot go back to their previous states. A side effect of this idea is that once an infected individual has recovered, they are immune to any further infections.
  • It is assumed that the total is large and each individual represents an ‘average’ person. That is, the model does not differentiate different types of susceptible individuals (for example if they have pre-existing conditions or any comorbidities), or the amount of infection (minor or severe).
  • Depending on the source, the R compartment can refer to either recovered or removed (that is removed from the susceptible and infected populations). Also to note that the R group accounts for both recovered and dead individuals.
  • Also the model parameters are the same for all individuals. This is to make the calculations simpler.

For each compartment in the model, a rate equation is developed based on the number of individuals entering and/or leaving that particular state. Leaving a state results in subtraction while entering involves addition. For the SIR model the equations can be developed as follows:

  • For S : The rate of susceptible individuals decreases by the product of the rate of infection with the number of susceptible and infected individuals
  • For I : The rate of infected individuals increases by the amount that the susceptible amount decreases. Also it decreases by how quickly infected individuals recover.
  • For R : The recovery rate increases by the rate that infected individuals recover

In summary, the set of differential equations are listed bellow:

\begin{aligned} \frac{dS}{dt} &= -\beta SI \\ \frac{dI}{dt} &= \beta SI\ -\ \gamma I \\ \frac{dR}{dt} &= \gamma I \end{aligned}

There is some interesting mathematics that can obtained in manipulating these equations, but for now we are currently interested only in a numerical solution. Youtube has many great videos that explore these equations analytically such this one by Trefor Bazett, or this other one by Tom Rocks Maths.

Note that in a future project we will be exploring the basic reproduction number, that is the famous R_0 value.

Numerically Solving the Model in Python and MatLab

Examining the equations for the SIR model, it can be seen that they cannot simply be just integrated. Due to their coupled nature the solution can only be found numerically.

MatLab is adored by a segment of the academic population (I personally have a love-hate relationship with it… I don’t like the licensing, or the performance, or the slow progress of improvements, but I do like some of the simplicity, and built in functions). In short, the ‘ode45’ function does all the work in solving a set of equations (which are specified as a separate function). A MatLab solution was included and is available here.

Plain python does not deal with differential equations or any semi-advanced math at all. To make up for this, the numpy and scipy packages were created and they contain a vast plethora of math niceties. The solution is similar to the MatLab one, however a few more steps are involved. First a function needs to be created to solve the current state of the ODEs. Next, an instance of the ‘integrate.ode’ object needs to be created (and initialized using the ODE function from the previous step). Then the integrate method on the new ODE solving instance is called repeatedly to obtain a result at a specific time. Solutions in ‘regular’ python and one using a Jupyter notebook are available on GitHub.

Numerically Solving the Model with Javascript and D3

Again for this project, an interactive web-based tool that solves the SIR model equations is/was developed. MatLab does have a technology to create a Web Apps, however it seems rather complex and one needs to purchase the MatLab Web App Server product to be able to do so.

There are quite a few technologies out there to make python accessible from the web. Possibly the most popular are Jupyter notebooks and the flask project. Jupyter notebooks are great (and I wish they were used in schools instead of MatLab), however they have a few issues namely: it can waste some memory and they aren’t very user friendly (that is fresh user would have to not be shocked by interacting with code to produce the required result). Flask seems nice however it involves running an additional web server and hence can involve extra resources. For my purposes, I am trying to be a little more efficient (since I am running this site on the smallest virtual instance available on google cloud), so I will be avoiding Flask.

Recently, I have read “Interactive Data Visualization for the Web: an Introduction to Designing with D3” by Scott Murray (and rather enjoyed it). Hence it will be used for the visualization for the solutions of the model equations since its sole purpose is to make interactive web based graphics. D3 is a javascript library, so it can easily interact with other web elements (such as CSS, various input controls, and json files).

One major problem is that the javascript math library is quite basic, hence it can not deal with numerically solving a set of differential equations. Luckily, various algorithms exist to numerically compute the solutions, and perhaps the best known is the 4^{\mathrm{th}} order Runge-Kutta technique (or RK4). Basically, RK4 predicts a new point based on the current point along with a weighted sum of derivatives. If the time step \delta t is rather small, RK4 can produce rather precise solutions (note that if \delta t is too small, the computation performance can get rather poor). On the web, there are many code resources related to the Runge-Kutta algorithm, however there are some confusing results when looking at adapting the algorithm to a set of differential equations. For clarity, I have included how one numerically determines the solution using RK4 for the SIR virus model equation below:

\begin{aligned} k_1 &= \delta t\ \cdot\ \frac{dS}{dt} \left( S_n, I_n, R_n \vphantom{\frac{1}{2}} \right) \\ \ell_1 &= \delta t\ \cdot\ \frac{dI}{dt} \left( S_n, I_n, R_n \vphantom{\frac{1}{2}} \right) \\ m_1 &= \delta t\ \cdot\ \frac{dR}{dt} \left( S_n, I_n, R_n \vphantom{\frac{1}{2}} \right) \\ k_2 &= \delta t\ \cdot\ \frac{dS}{dt} \left(S_n + \frac{1}{2}k_1, I_n + \frac{1}{2}\ell_1, R_n + \frac{1}{2}m_1\right) \\ \ell_2 &= \delta t\ \cdot\ \frac{dI}{dt} \left(S_n + \frac{1}{2}k_1, I_n + \frac{1}{2}\ell_1, R_n + \frac{1}{2}m_1\right) \\ m_2 &= \delta t\ \cdot\ \frac{dR}{dt} \left(S_n + \frac{1}{2}k_1, I_n + \frac{1}{2}\ell_1, R_n + \frac{1}{2}m_1\right) \\ k_3 &= \delta t\ \cdot\ \frac{dS}{dt} \left(S_n + \frac{1}{2}k_2, I_n + \frac{1}{2}\ell_2, R_n + \frac{1}{2}m_2\right) \\ \ell_3 &= \delta t\ \cdot\ \frac{dI}{dt} \left(S_n + \frac{1}{2}k_2, I_n + \frac{1}{2}\ell_2, R_n + \frac{1}{2}m_2\right) \\ m_3 &= \delta t\ \cdot\ \frac{dR}{dt} \left(S_n + \frac{1}{2}k_2, I_n + \frac{1}{2}\ell_2, R_n + \frac{1}{2}m_2\right) \\ k_4 &= \delta t\ \cdot\ \frac{dS}{dt} \left(S_n + k_3, I_n + \ell_3, R_n + m_3 \vphantom{\frac{1}{2}} \right) \\ \ell_4 &= \delta t\ \cdot\ \frac{dI}{dt} \left(S_n + k_3, I_n + \ell_3, R_n + m_3 \vphantom{\frac{1}{2}} \right) \\ m_4 &= \delta t\ \cdot\ \frac{dR}{dt} \left(S_n + k_3, I_n + \ell_3, R_n + m_3 \vphantom{\frac{1}{2}} \right) \\ &\quad \\ S_{n+1} &= S_n + \frac{1}{6} \left( \vphantom{\frac{1}{2}} k_1 + 2k_2 + 2k_3 + k_4 \right) \\ I_{n+1} &= I_n + \frac{1}{6} \left( \vphantom{\frac{1}{2}} \ell_1 + 2\ell_2 + 2\ell_3 + \ell_4 \right) \\ R_{n+1} &= R_n + \frac{1}{6} \left( \vphantom{\frac{1}{2}} m_1 + 2m_2 + 2m_3 + m_4 \right) \end{aligned}

Note that the calculations start at n = 0, that is the initial conditions. Also, \delta t is the time step, so one must determine the number of iterations (n_{\mathrm{max}}) beforehand by dividing the maximum time by the time step.

Results using JS and D3

Below is the final product, that is an interactive tool to examine the behaviour of the SIR virus model equations. Some items to note:

  • Values for the transmission rate \beta and the recovery rate \gamma can be manipulated by sliders. The minimum and maximum values were set so that the overall results are reasonable.
  • Too keep things simple, the overall population of all individuals (i.e. S + I + R) is 1. Therefore all values in the graph are based on a percentage.
  • The initial values for S, I, and R are 0.99, 0.01, and 0 respectively. That is, we start with a 1% infection rate.
  • The system capacity can be visualized as a horizontal line. It can be adjusted and has been included as an aid in understanding.
  • Also, the maximum time can also be adjusted.
  • Finally

Also, for those who are limited by wordpress’s page margins (i.e. mainly mobile users), a separate window version is available here.


The code is available here for to download. It is written in Javascript and the D3 framework.

Examining the Basic Reproduction Number

The basic reproduction number, or R_0, represents the number of new infections estimated to stem from a single case. Because of Covid-19 it has been in the news quite a bit lately (as seen in recent articles from the NYTimes and the BBC).

From the previously discussed SIR Model, it can be derived using the infection rate equation:

\frac{dI}{dt} = \beta SI\ -\ \gamma I

At the initial time, t = 0, the initial conditions can be substituted:

\left.\frac{dI}{dt}\right|_{t = 0} = \beta S_0 I_0\ -\ \gamma I_0

From calculus class, there are three derivative values that we care about: zero (a max or min), greater than zero (increasing values), and less than zero (decreasing values). Setting the above to zero and rearranging, we then have:

\mathsf{Decreasing}\ \Rightarrow \frac{\beta S_0}{\gamma} < 1 \qquad \qquad \mathsf{Increasing}\ \Rightarrow \frac{\beta S_0}{\gamma} > 1

To shorten the notation, the fraction composed of \beta, \gamma, and S_0 can be re-written as:

R_0 \equiv \frac{\beta S_0}{\gamma}

Therefore, we are concerned about two cases:

  • R_0 > 1: Increasing infection spread; that is the conditions for an epidemic to occur
  • R_0 < 1: Decreasing infection spread; it is possible for some infections to occur, however the infection spread is controlled.

So to keep R_0 < 1, from basic mathematics we known that the \beta and S_0 need to be reduced, while the \gamma needs to be increased. This is possible by:

  • \beta – the transmission rate: reduce the chance for the virus to propagate by physical distancing, hand washing, and possibly wearing a mask
  • S_0 – the initial number of susceptible individuals: to reduce the size of this group, a vaccine needs to be developed and administered.
  • \gamma – the recovery rate: there has been some promising new drugs in the news however the are either still in development or fail in larger studies.


Taking the graph from the SIR Model the following visualization was made. Some key items to note:

  • The main purpose of this graph is qualitative, that is to show the overall behaviour obtained when \beta, \gamma, and S_0 are increased/decreased on the number of infections. The parameter values have been omitted since we are mainly concerned in the overall trend.
  • Another reason for not including the parameter values is the main audience that this was intended for was some very scientifically illiterate individuals.
  • The initial value of infected individuals was 1%, which might be rather high but it was chosen to make the calculations a little more simple.

Note a full window version is available here (for those who are struggling with the constraints that wordpress puts on page sizes).

D3 Notes

This visualization was made in D3. One of the key elements in the visualization is the transition between the different states of the solution. The built in transitioning code that D3 has included assumes a linear behaviour between the start and end states which is fine for various types of charts. The model equations are non-linear and hence built in transition code would not provide the correct behaviour. However, D3 allows one to provide their own interpolation code, in the form of a tween.

The basic form of a tween is first create an interpolator that will return a value between the the start and end values based on an input between 0 and 1. Next, a function needs to be returned that will be called for each step of the transition. For my code, it is my ODE solver that uses the previously created interpolator as an input. This technique allows for more complex things to be interpolated such as text, numbers, or differential equations.

Another small element to note is the fading of the previous curves during a transition. One minor quirk with D3 is the scheduling of graphics updates. In a scenario where elements that had transitions associated to them, along other elements with transition start (or end) methods, along with those elements transitions from previous function calls… basically where a many elements all had transitions, it seemed that the scheduling was rather arbitrary. In order to ‘pseudo schedule’ many transitions, the delay method was used.


The Javascript and D3 code for this visualization is available here.


No Comments

Add your comment