The goal is to make a web based map of Ontario that would both display and allow one to interact with Covid-19 data. Some maps have either used Google Maps or open street maps to show some key data elements. Others have static images that were created in some other piece of software that needs to be manually updated and has minimal interactivity. Perhaps it is a bit redundant making a visualization tool, however its a decent learning experience. As with my recent excursion into web based data visualization, the code is written in JS + D3.
Getting Map Data
A map of the public health units (or PHUs) in Ontario can be obtained from the government’s website. It is a 9.9 Mb SHP file which is too big in size for my liking (would prefer smaller files that are faster to download), and is an incompatible file type for standard web data.
There are a variety of tools that exist that solve both problems; the results are typically a json file with much reduced complex geometry. Perhaps the easiest of all the tools is mapshaper, since it provides all this functionality as a free web-app. Using this app, the file size was reduced by 95% while still maintaining a fairly detailed map. The overall file size is on the order of a typical jpeg file; so transfer times should not be greatly impacted.
The original map file has a few properties for each PHU region; these include: ‘MOH_SERVICE_PROVIDER_IDENT’, ‘ENGLISH_NAME’, ‘FRENCH_NAME’, ‘Shape_Length’, and ‘Shape_Area’. Previously, the Ontario Covid-19 case data was explored, and it contained information for each PHU such as office addresses, locations (latitude and longitude), along with case data. For the visualization, two data processing elements need to be performed:
- Add static PHU information (such as the addresses and locations) to the json file. (The script that modifies the json file is available here).
- Output the case data to a separate CSV file. Note that code for this was already written when that database explored.
Displaying the Map
Colours…
From a quick examination of the data, it can be quickly noticed that the vast amount of the cases is in a few regions; that is the data is very much skewed. To balance out the colours over all the regions, a logarithmic adjustment listed below was performed:
Since the purpose of the map is to quickly compare the amount of cases in different regions, then using the logarithmic adjustment provides acceptable results.
Preset Zooming…
To keep things as simple as possible, preset zooming functionality was implemented on the map. The belief that the users want to see any trends with the data, not particular details with the geography (for that Google Maps is a much better option).
By clicking on the region buttons on the side, the map view will be repositioned and zoomed for the appropriate PHU region. The regional offices for all PHUs in the region are drawn and listed (and the address is shown as a tooltip).
Tool Tips…
Tool tips are drawn for PHUs in two different styles. In the full province view, a shortened tooltip is shown containing the totals for the number of resolved, not resolved, fatal, and total cases. When zoomed into a region, the tooltips also contain age group information.
PHU Region Grouping…
PHUs are grouped into seven groups/regions (as seen here). In the script, that augments the json file with PHU information, some additional code was written to determine the region that a PHU belongs to.
Visualization
Note that the data is common for all of the graphs on the other pages. To minimize any errors, it is manually updated. The date range for the data used in the map is listed below…
Click here to view the map in a separate window without any WordPress page decoration (for mobile users).
Code
Finally, the code to process the json file and map visualization is available here,
Using additional data…
The University of Toronto’s Dalla Lana School of Public Health has Covid-19 datasets for all of Canada. The data seems to be compiled manually from each province’s public health unit. The one major difference with the Dalla Lana data compared to the Ontario government’s datasets ([1], [2]) is that daily values along with a running cumulative sum are also available. Therefore, it is possible to track how the virus has behaved over time in the country, a province, or at the public health unit level.
The Dalla Lana research group has created dashboard with various visualizations (that always seems to time out for me). All their data is available in a GitHub repository in the form of CSV files. To augment the previous data explorations already done, two files were of interest:
- cases_timeseries_hr.csv: the daily number and cumulative number of cases for each public health unit within each province.
- mortality_timeseries_hr.csv: the daily number and cumulative number of fatalities for each public health unit within each province.
There are a few things to take note of when dealing with the dataset:
- The names of the public health units are slightly different compared to the Ontario government’s dataset (for example: Region of Waterloo Public Health is simply listed as Waterloo). Hence some extra code needed to be written so that the dataset could work with the map that was recently created.
- The date ranges are quite a bit different between the two cases. Basically there are a little over a months more days in the case set.
- Also the date format is in an opposite order compared to the provincial data
- Finally, the CSV files are for all of Canada. Additional processing needed to be performed to pull out the Ontario data.
Dealing with the Data
To simplify the visualization, both the cases and mortality time series data was merged together into one file. Also, only the data for Ontario was filtered out of the dataset.
The processing of the data was done using Python. All the key elements were extracted into separate pandas data frames, which were then labeled and merged together before they were exported to a CVS file for the visualization process.
One additional feature that was added was the first derivative for the cumulative case/fatality amounts. This was done to track the rate of how quickly cases/fatalities increase. The derivative was calculated on the five day moving average to reduce any affects from daily noise. Recall that the first derivative can be numerically be calculated using the ‘centered difference approximation‘ as:
Using the outputted CSV file, two different visualizations were produced:
- A standard line graph where the daily and cumulative values for both the number of cases and fatalities are plotted. Data is viewed for each public health unit, and on a month by month basis or the overall time. Some extra time was taken so that smooth transitions are performed for each data update.
- A map of Ontario is also drawn, where the public health units are shaded using the totals and derivatives. The date is adjusted by using a slider on top of the map. Also, preset buttons allow zooming into specific regions.
Note that the visualizations were both created using the D3 and Javascript.
The code for the processing and visualization is available in the following GitHub repository.
Overall Time…
To ensure that there isn’t any issues with processing the data, the scripts are manually executed. I am always a bit hesitant with external data sources that anything can happen with. While I try to update the scripts daily, it is sometimes impossible, hence the date is listed below:
More Data Visualizations…
Next the line graphs of the daily and cumulative values for both the number of cases and fatalities are presented.
Click here to view the data in a separate window (without any WordPress page decorations; for mobile users).
Finally, a map of Ontario showing the running cumulative totals for both the number of cases and fatalities. Also, the rates of increase are also shown, where the darker the colour the larger the increase (and no colour corresponds to a zero rate increase).
Click here to view the data in a separate window (without any WordPress page decorations; for mobile users).
No Comments