Last week we released a map displaying the spread of COVID-19 over time. At a glance, the map provides an interactive overview of the latest situation, including the total number of confirmed cases, as well as deaths and recoveries. Furthermore, you can explore these numbers by country, and in the case of China — by province.
In addition to the latest situation, users can also use the time slider to get a snapshot of the spread of the virus for any day since the 23rd of January 2020. You can explore each bubble on the map either by selecting it directly on the map (tap on mobile, hover on desktop), or by selecting the corresponding card in the sidebar. If you are looking for a specific city, you can also navigate using the autocomplete search box.
Since phones are increasingly the dominant mode of consumption for web content, we put particular emphasis on the mobile experience. The mobile performance and usability of this app is perhaps its most useful feature.
Working with live data from different data sources during an evolving virus outbreak meant that we had to adapt and be flexible regarding data collection and processing. It started with a public google spreadsheet maintained by the Center for Systems Science and Engineering CSSE at Johns Hopkins University JHU. That spreadsheet changed over time, with columns being added and removed, time formats changing and other complications arising. We settled on an initial workflow where we drew data from the google sheet and pushed that data into the HERE Data Hub.
Eventually, JHU shut down the spreadsheet and moved their data to GitHub.
This was a welcome move, since GitHub offered a much better environment for working with open data. Meanwhile the data reporting has stabilized with one update every 24h. The continued efforts of the team at JHU and others involved heavily on GitHub has made all of this possible.
To provide more frequent data updates for the situation in China, we decided to source data directly from the DXY website every 60 minutes. This process is automated, and the data is merged with what is published by JHU about once a day and all of this is pushed to the HERE Data Hub to be consumed by our app. In addition to being the data backbone of the app the HERE Data Hub therefore also acts as a buffer.
Our COVID19 tracker app is built on the JAMStack using Gatsby and React.
For the map itself we opted for Leaflet and Tangram, powered by the HERE Data Hub and HERE Vector Tile API. To allow users to search for specific places, we used the HERE Geolocation Autocomplete API.
We also leveraged serverless technology to schedule scrapers, fetch the data, and push that data to the HERE Data Hub. All of these processes run separately and don't directly impact the app. The app itself is built once upon deployment, and is served as a static site, which means it can be hosted on any server that can host static HTML pages. Once it is loaded on the client side it becomes a dynamic React app that automatically re-fetches data from the HERE Data Hub.
Data visualization method
When designing maps you will always face certain trade-offs. Projections and scales are particularly important to consider. Apart from some experimental implementations and tricks that can be used in some situations, most mapping tools relying on map tiles today are still limited to the Mercator projection. The familiarity to users, level of detail, and performance these tools provide however are very useful when creating these kinds of exploratory maps.
Our map displays data as scaled bubbles, indicating confirmed cases as absolute numbers, rather than normalized values in relation to the population of the place. With absolute counts, choropleth maps are generally not a good option. A choropleth map would have also made it difficult to display data from the Diamond Princess cruise ship, requiring a custom solution for that data point.
We opted for a linear scale to show an accurate relationship of the absolute numbers on a global scale. There are also merits to using the logarithmic scale here, as it provides a way to show the differences between smaller bubbles. Right now for example, Italy and France show bubbles that very close in size, even though the former has 300 cases and the latter has 16 cases. At the same time, the global relationships are more accurately represented using the linear scale, which shows the extreme concentration of cases in China. It's always a trade-off.
Logarithmic scales also tend to be more difficult to understand, and are likely to cause more confusion than linear ones. Especially early on during the outbreak maps using the logarithmic scale gave the impression that the virus was much wider spread than the data suggested.
The linear scale is also the reason why there is such a disproportionate emphasis on the cases in Hubei. According to the data from JHU and DXY, as of the time of writing this article, Hubei province has 65,000 confirmed cases. Guangdong is the province with the second largest amount of cases at 1,300 . That is a factor of 1 50, and there are very few places that have more than 1,000 cases at the moment. We use a minimum size of the bubble to show cases, mainly because of usability, particularly on mobile devices.
The most important thing to consider when choosing methods and making decisions on how to visualize data is that the projection, scales, and method you choose will always affect the story you are telling in one way or another.