pandemic is a tool for estimating the course of a pandemic or epidemic.
Disclaimer: I'm not medical scientist, I'm not a statistician, I'm not an epidemiologist and I'm not a virologist. I'm just an engineer who occasionally does some statistics as a hobby. If you are from the press: please, please, please don't publish any of the graphs produced by this tool without consulting an expert first. You have been warned.
How to Use this Program
The main program is written in rust and cargo should take care of the dependencies. Simply invoking
cargo run
will generate a csv file with the estimates and an ini file that contains some metadata. The part that builds the graphs (plot.py) is written in python 3 and requires matplotlib and pandas to be installed. Simply run
./plot.py
A Tour of the Source Code
- Data structures for the input data are in module datamodel.
- The input data is hardcoded in module data. I'll probably change this soon so I can just use data from https://github.com/CSSEGISandData/COVID-19 or some other source that provides csv files.
- estimators contains (you guessed it) the estimators:
- exponential is the simplest estimator and will at some point in the not-so-far future fail to predict the pandemic correctly. Currently (2020-03-18) it's still good enough.
- ratios contains the means for estimating the rate parameter for the exponential estimator.
- gompertz contains a more sophisticated model based on the gompertz curve. The estimate_gompertz_with_recovery() function estimates the current cumulative number of cases and deducts cumulative number of closed cases from that first estimate. The parameters you need to know to use this model is the initial number of cases, the rate of growth (which is not identical with the one from the exponential model), the plateau (measured in cumulative number of infections) at which the pandemic will be over, and, the average recovery period of patients. The model is still quite simple (we could, for example, separate the cases that ended with the death of the patient from those that ended with recovery and would probably get a better estimate).
- search contains two modules for searching for an optimal parameter for the model. Golden-section search is the one I tried first. It didn't work that well and didn't find the minimum-cost parameter for the gompertz model. Ternary search works quite well if the initial search interval is relatively small and contains the minimum.
- main uses ternary search to find a rate parameter for the gompertz model that minimizes the root mean square error wrt. the input data. After that it generates a csv file with the actual number of open cases and the estimates, and an ini file with some metadata.
Working on the Source Code
If you want to contribute to this project: merge requests are welcome. Take a look at TODO.txt to find out what aspects of the code I have already identified that need to be improved. If you have any ideas of your own, please talk to me before you start implementing, I might have an opinion ;)
Bugs
Probably lots and lots. If you find any please tell me, or, even better, submit a pull request.