Amid the COVID-19 pandemic, the Hong Kong Government (HKG) strove to achieve Dynamic Zero Infection by introducing vaccine passports to encourage public vaccinations. Furthermore, the HKG exercised its power under the Prevention and Control of Disease Ordinance (Chapter 599) to require individuals who had been present at specified premises to undergo a COVID-19 nucleic acid test.
To effectively monitor the COVID situation, we built an end-to-end pipeline solution that gathered data and created a dashboard. This dashboard allowed the public (end users) to understand the status of the pandemic and alerted them to potential outbreaks in their neighbourhoods.
Image Credit: info.gov.hk
First Published: 25 August 2022
Last Updated: 18 October 2024
- 1 - Motivation: Visualising the Compulsory Test Frequency
- 2 - How Many Times Had You Been Selected?
- 3 - Solution Architecture
Earlier, we came across an intriguing post on HKGolden discussing the nuisances caused by the Compulsory Testing Notice (CTN) and the desire for a Dragon Tiger Billboard (also known as 龍虎榜 in Chinese or ranking billboard in English), which ranks the buildings that appeared most frequently on the CTN.
Unfortunately, there was no official publication providing such a ranking. The CTN was presented in PDF format, making it challenging to grasp the status of each location. Inspired by this idea, we initiated a project to create a dashboard that conveniently visualises the frequency of specified premises being listed on the CTN.
If you resided in Hong Kong in 2022, it was likely that you were asked to undergo a COVID test. However, did you know how many times you were officially requested to take a test?
Simply visit the Compulsory COVID Testing Monitor on Tableau Public, and you can find the most recently affected buildings.
Warning
The dashboard is no longer being updated, and the last recorded entry for the CTN was on December 23, 2022.
You can host the data pipeline in your preferred environment. The instructions below guide you through the deployment process. We use the Adobe PDF Extract API in the pipeline, which requires API credentials. You can create one for free by following their instructions.
Caution
The pipeline is deprecated because we had already achieved Dynamic Zero Infection 👌🏻.
💻 Local Host (Recommended)
Being lightweight, the pipeline is designed for a localhost with local directories. It is highly recommended to host it on a local computer for cost efficiency. Before following the steps, make sure your computer has Anaconda installed to run the pipeline.- Clone the repository and navigate into the folder.
$ git clone https://github.com/Jack-cky/Compulsory-COVID-Testing-Monitor.git $ cd Compulsory-COVID-Testing-Monitor
- Set up the configuration for execution.
$ cp ./config/.env.example ./config/.env
- Update the API credentials inside
./config/.env
.CLIENT_ID=PDF_SERVICES_CLIENT_ID CLIENT_SECRET=PDF_SERVICES_CLIENT_SECRET
- (Optional) By default, the pipeline processes only today’s records if a date range is not defined.
DATE_FROM=20220111 DATE_TO=20221223
- Set up a virtual environment.
$ make init
- Execute the pipeline.
$ make run
🐳 Docker Host
Although the pipeline is designed for local directories, the content can still be mounted to retrieve the output data. Before following the steps, make sure that your computer has Docker installed to run the pipeline.- Clone the repository and navigate into the folder.
$ git clone https://github.com/Jack-cky/Compulsory-COVID-Testing-Monitor.git $ cd Compulsory-COVID-Testing-Monitor
- Set up the configuration for execution.
$ cp ./config/.env.example ./config/.env
- Update the API credentials inside
./config/.env
.CLIENT_ID=PDF_SERVICES_CLIENT_ID CLIENT_SECRET=PDF_SERVICES_CLIENT_SECRET
- (Optional) By default, the pipeline processes only today’s records if a date range is not defined.
DATE_FROM=20220111 DATE_TO=20221223
- Execute the pipeline.
$ docker run --env-file ./config/.env -v "$(pwd)/data:/ctn-monitor/data" -v "$(pwd)/logs:/ctn-monitor/logs" jackcky/ctn-monitor
The architecture is quite straightforward. Every day, the Centre for Health Protection releases a CTN that is structured in a table format in PDF, detailing all specified locations.
For the extraction of these tables, we utilise the Adobe PDF Extract API, which accurately captures tables in PDF format compared to other open-source tools. To enrich the dataset, we supplement the addresses with spatial information using the Hong Kong Address Parser to access HKG's APIs.
The ETL process is performed using Pandas, which consolidates the data into an Excel file. This file then serves as the data source for the dashboard. The dashboard is crafted in Tableau and published on Tableau Public for the general public to review.
To productionise the pipeline, the output destination needs to be changed depending on the situation. Suppose you want to deploy the pipeline in an AWS environment, the data layer will be directed to a S3 bucket. A Lambda function could then be scheduled to execute a Docker image (further development required) once every night. Assuming the dashboard serves the end user 24/7, it would require approximately USD 0.10 per month for operation. Detailed price calculations can be found on the calculator.
Note
The estimated operating cost does not include the Tableau licence fee.
[3.0.0] Archive Version
[3.0.1] 2024-10-18Minor improvement before archiving the repository.
- Built Docker image with a multistage build to reduce image size.
- Compressed image size.
- Specified the Python version in the Makefile.
- Updated README for consistency with other projects.
[2.0.0] Revamped Version
[2.0.3] 2024-08-23Enhanced the pipeline folder structure.
- Moved Dockerfile and main script to the root directory.
- Removed redundant reading of the .env file.
- Updated backlog URL.
- Updated the services used in the architecture diagram.
[2.0.2] 2024-08-02
Enhanced pipeline execution.
- Added product backlog for review.
- Calculated operational costs in the production scenario.
- Improved pipeline with directory setup.
- Used Makefile for recompilation.
- Updated Dockerfile to reduce image size.
- Wrote more descriptive instructions.
[2.0.1] 2024-07-22
Revamped the data pipeline and dashboard design.
- Enhanced the dashboard design for a more professional appearance.
- Segregated the data pipeline into distinct modules.
- Switched PDF table extraction from using Tabula-py to the Adobe PDF Extract API.
[1.0.0] Project Initiation
[1.0.1] 2022-08-25Initial Repository.
This project is managed with a product backlog. You can review the backlog to understand the prioritised list of features, changes, enhancements, and bug fixes made during development.
This project is licensed under the MIT License. See the LICENSE file for details. Feel free to fork and customise it to meet your needs!
The initial dashboard design was referenced from 交齊功課龍虎榜 @ Ho Dao College.