Exploring Ontario Covid Case Data… part 2

Ontario has two Covid-19 datasets available on its Data Catalogue website. Previously, the Covid-19 Case data was examined, and now the Covid status data will be further explored.

Note: I am not an epidemiologist. Any results and conclusions are made for my own personal amusement and education (and perhaps to show a possible employer).

This data has more Covid-19 information for each case such as the age range, outcome, gender, and the reporting public health unit. While this is an improvement compared to the previous dataset, there are a few issues with it:

  • The data is nice but there are a few flaws with it; including NA values, invalid values (such as dates, for example 01-01-2020), and badly categorized values
  • There is only one date, the day when the case was reported. The date of when the status of the case was updated (for example from non-resolved to resolved), is not reported.
  • Instead of ages being reported, ranges of ten years are used.
  • No specific medical details about a case are listed (such as if any comorbidities are present)
  • There is too much Public Health Unit data to the point that it is redundant. This includes the address, postal code, city, website, latitude, and longitude. This information should be in a separate table and someone who is dealing with this data can then do a table join if and when they need to do so.
  • Also, the ‘Case_AcquisitionInfo’ and ‘Outbreak_Related’ columns are not being dealt with. For now we are mainly interested in the higher level statistics.
  • There are other things that can be considered such as the ‘Outcome1’ field. Are there other outcomes, that is is there an Outcome2 or Outcome3 that have been omitted?

Data Processing

The dataset is many times larger than the previous Ontario Covid Case data. It would be difficult to deal with the data using tools such as excel. Also, there are some intersecting groups that require nested queries so just using raw Javascript and D3 would be rather difficult. Perhaps the best solution is to use Python (specifically the Pandas package) to process the data into something more digestible for web based visualization. By reducing the data to the necessary elements, webpages load faster and little load is required on the clients machine.

The code was first developed using Jupyter notebooks which was then made into standalone Python scripts. Another script downloads the data, and runs all the processing scripts, and outputs CSV files (note that all this runs once a day on the web server). Then using JS + D3, interactive graphs are generated on the client’s machine from the CSV files.

The code for all the Python processing and D3 visualization is available in this GitHub repository.


The excursion into the data, starts from a high level before it is further examined by age groups, mortality rates, and public health units.

Note: all the graphs are created using the same data set. Hence the start and end dates are the same whenever any data is presented; so instead of listing the date for each graph, it is listed once here.

Below is a high level view to see the current state of all the Covid-19 cases in Ontario:

Next, case data is broken down by gender. We must note that there are 5 gender categories: female, male, unknown, other, and transgender. Currently, amount of other and transgender cases is less than 0.01% of the overall total so they will be merged with the unknown group (and labelled as Other). Again we are viewing the data at a high level, so merging the groups is done to see any general trends. Note two buttons were added to allow one to view the data either by percent or the absolute values; also the numeric value is displayed as a tooltip.

Age Groups

The dataset does not list the exact ages for the reported cases. Instead, age ranges are used; these include: <20, 20s, 30s, 40s, 40s, 50s, 60s, 70s, 80s, 90s, and unknown. Since the unknown group is very small (containing approximately than 0.03% of the total), it has not been included in the graph below.

Note, the buttons on the side select which dataset is to be displayed (either using absolute values or percentages). Also, the death rate values are shown only as a percentage (hence the absolute & percent buttons do nothing in this case).

Click here to view the above graph without any wordpress page decorations (for mobile users).

Death Rate

From the previous Covid-19 data exploration, it was noted that various sources report death/fatality/mortality rates slightly differently. In the above graph, the death rate was calculated by the following equation:

\mathsf{Death\ Rate} = \frac{\mathsf{Deaths}}{\mathsf{Resolved}} = \frac{\mathsf{Deaths}}{\mathsf{Recovered} + \mathsf{Deaths}}

These values were determined for each age group. The death rate value can be interpreted as the percentage of deaths resulting from infections. Note that infections are not included in the calculation; once someone recovers they will be members of the death or the recovered groups, so counting them now, we are assuming what group that they will belong to in the future which is incorrect.

Average Fatality Age

Using the number of fatalities per age group it is possible to determine the average fatality age (or what was the age of the average person who died from Covid-19). Recall that the no ages were given, rather we have age groups with ranges of 10 years. Assuming that the ages of the members of the group are uniformly distributed, then the average fatality age can be determined by:

\small \mathsf{Average\ Fatality\ Age} = \frac{\sum_{i \in\mathsf{Age\ Group}} (\mathsf{Average\ Age }) \cdot (\mathsf{Num.\ Fatalities\ in\ Group})}{ \sum_{i \in\mathsf{Age\ Group}} \mathsf{Num.\ Fatalities\ in\ Group} }

Performing the above calculation with the dataset we have:

Comorbidity Rates

The data does not include any information if any comorbidities are present for any of the cases. In general, comorbidities increase with age. There are many studies on this, including:

  • Davis JW, Chung R, Juarez DT. Prevalence of comorbid conditions with aging among patients with diabetes and cardiovascular disease. Hawaii Med J. 2011; 70(10):209‐213.
  • Piccirillo JF, Vlahiotis A, Barrett LB, Flood KL, Spitznagel EL, Steyerberg EW. The changing prevalence of comorbidity across the age spectrumCrit Rev Oncol Hematol. 2008;67(2):124‐132. doi:10.1016/j.critrevonc.2008.01.013
  • Statistics Canada. Table 13-10-0710-01 Deaths and mortality rates by age group

Public Health Units

There are over 30 public health units (PHUs) in Ontario. From a quick analysis, the amount of Covid cases is concentrated in half of them. There are more details that can be pulled out by looking at PHUs, and this will be further investigated in the future.

Click here to view the above graph without any wordpress page decorations (for mobile users).


No Comments

Add your comment