Skip to content
This repository has been archived by the owner on Oct 18, 2024. It is now read-only.

An end-to-end dashboard development that effectively monitors and alerts the public about potential COVID-19 outbreaks in their neighborhoods.

License

Notifications You must be signed in to change notification settings

Jack-cky/Compulsory-COVID-Testing-Monitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💉 Compulsory COVID Testing Monitor


Amid the COVID-19 pandemic, the Hong Kong Government (HKG) strove to achieve Dynamic Zero Infection by introducing vaccine passports to encourage public vaccinations. Furthermore, the HKG exercised its power under the Prevention and Control of Disease Ordinance (Chapter 599) to require individuals who had been present at specified premises to undergo a COVID-19 nucleic acid test.

To effectively monitor the COVID situation, we built an end-to-end pipeline solution that gathered data and created a dashboard. This dashboard allowed the public (end users) to understand the status of the pandemic and alerted them to potential outbreaks in their neighbourhoods.

Image Credit: info.gov.hk

First Published: 25 August 2022
Last Updated: 18 October 2024

Table of Contents

Motivation: Visualising the Compulsory Test Frequency

Earlier, we came across an intriguing post on HKGolden discussing the nuisances caused by the Compulsory Testing Notice (CTN) and the desire for a Dragon Tiger Billboard (also known as 龍虎榜 in Chinese or ranking billboard in English), which ranks the buildings that appeared most frequently on the CTN.

Unfortunately, there was no official publication providing such a ranking. The CTN was presented in PDF format, making it challenging to grasp the status of each location. Inspired by this idea, we initiated a project to create a dashboard that conveniently visualises the frequency of specified premises being listed on the CTN.

a golden son had been tested 3 times within a month.

How Many Times Had You Been Selected?

If you resided in Hong Kong in 2022, it was likely that you were asked to undergo a COVID test. However, did you know how many times you were officially requested to take a test?

Finding the Latest Updates

Simply visit the Compulsory COVID Testing Monitor on Tableau Public, and you can find the most recently affected buildings.

Warning

The dashboard is no longer being updated, and the last recorded entry for the CTN was on December 23, 2022.

Hosting Your CTN Monitor Pipeline

You can host the data pipeline in your preferred environment. The instructions below guide you through the deployment process. We use the Adobe PDF Extract API in the pipeline, which requires API credentials. You can create one for free by following their instructions.

Caution

The pipeline is deprecated because we had already achieved Dynamic Zero Infection 👌🏻.

💻 Local Host (Recommended) Being lightweight, the pipeline is designed for a localhost with local directories. It is highly recommended to host it on a local computer for cost efficiency. Before following the steps, make sure your computer has Anaconda installed to run the pipeline.
  1. Clone the repository and navigate into the folder.
    $ git clone https://github.com/Jack-cky/Compulsory-COVID-Testing-Monitor.git
    $ cd Compulsory-COVID-Testing-Monitor
  2. Set up the configuration for execution.
    $ cp ./config/.env.example ./config/.env
  3. Update the API credentials inside ./config/.env.
    CLIENT_ID=PDF_SERVICES_CLIENT_ID
    CLIENT_SECRET=PDF_SERVICES_CLIENT_SECRET
    
  4. (Optional) By default, the pipeline processes only today’s records if a date range is not defined.
    DATE_FROM=20220111
    DATE_TO=20221223
    
  5. Set up a virtual environment.
    $ make init
  6. Execute the pipeline.
    $ make run
🐳 Docker Host Although the pipeline is designed for local directories, the content can still be mounted to retrieve the output data. Before following the steps, make sure that your computer has Docker installed to run the pipeline.
  1. Clone the repository and navigate into the folder.
    $ git clone https://github.com/Jack-cky/Compulsory-COVID-Testing-Monitor.git
    $ cd Compulsory-COVID-Testing-Monitor
  2. Set up the configuration for execution.
    $ cp ./config/.env.example ./config/.env
  3. Update the API credentials inside ./config/.env.
    CLIENT_ID=PDF_SERVICES_CLIENT_ID
    CLIENT_SECRET=PDF_SERVICES_CLIENT_SECRET
    
  4. (Optional) By default, the pipeline processes only today’s records if a date range is not defined.
    DATE_FROM=20220111
    DATE_TO=20221223
    
  5. Execute the pipeline.
    $ docker run --env-file ./config/.env -v "$(pwd)/data:/ctn-monitor/data" -v "$(pwd)/logs:/ctn-monitor/logs" jackcky/ctn-monitor

Solution Architecture

The architecture is quite straightforward. Every day, the Centre for Health Protection releases a CTN that is structured in a table format in PDF, detailing all specified locations.

For the extraction of these tables, we utilise the Adobe PDF Extract API, which accurately captures tables in PDF format compared to other open-source tools. To enrich the dataset, we supplement the addresses with spatial information using the Hong Kong Address Parser to access HKG's APIs.

The ETL process is performed using Pandas, which consolidates the data into an Excel file. This file then serves as the data source for the dashboard. The dashboard is crafted in Tableau and published on Tableau Public for the general public to review.

Production Scenario

To productionise the pipeline, the output destination needs to be changed depending on the situation. Suppose you want to deploy the pipeline in an AWS environment, the data layer will be directed to a S3 bucket. A Lambda function could then be scheduled to execute a Docker image (further development required) once every night. Assuming the dashboard serves the end user 24/7, it would require approximately USD 0.10 per month for operation. Detailed price calculations can be found on the calculator.

Note

The estimated operating cost does not include the Tableau licence fee.

Changelog

[3.0.0] Archive Version [3.0.1] 2024-10-18
Minor improvement before archiving the repository.

Changed

  • Built Docker image with a multistage build to reduce image size.
  • Compressed image size.
  • Specified the Python version in the Makefile.
  • Updated README for consistency with other projects.
[2.0.0] Revamped Version [2.0.3] 2024-08-23
Enhanced the pipeline folder structure.

Changed

  • Moved Dockerfile and main script to the root directory.
  • Removed redundant reading of the .env file.
  • Updated backlog URL.
  • Updated the services used in the architecture diagram.

[2.0.2] 2024-08-02
Enhanced pipeline execution.

Added

  • Added product backlog for review.
  • Calculated operational costs in the production scenario.
  • Improved pipeline with directory setup.
  • Used Makefile for recompilation.

Changed

  • Updated Dockerfile to reduce image size.
  • Wrote more descriptive instructions.

[2.0.1] 2024-07-22
Revamped the data pipeline and dashboard design.

Changed

  • Enhanced the dashboard design for a more professional appearance.
  • Segregated the data pipeline into distinct modules.
  • Switched PDF table extraction from using Tabula-py to the Adobe PDF Extract API.
[1.0.0] Project Initiation [1.0.1] 2022-08-25
Initial Repository.

Product Backlog

This project is managed with a product backlog. You can review the backlog to understand the prioritised list of features, changes, enhancements, and bug fixes made during development.

License

This project is licensed under the MIT License. See the LICENSE file for details. Feel free to fork and customise it to meet your needs!

Credits

The initial dashboard design was referenced from 交齊功課龍虎榜 @ Ho Dao College.

About

An end-to-end dashboard development that effectively monitors and alerts the public about potential COVID-19 outbreaks in their neighborhoods.

Topics

Resources

License

Stars

Watchers

Forks