This hackathon is a shared event between the Information Visualization and Data Science Project Management subjects with support from the ViT Foundation, the Càtedra Lluís Santaló d'Aplicacions de la Matemàtica and the Càtedra d'Informació i Computació (premi Eurecat).
💡 What we want to solve:
We want to sketch out the lifecycle of a metric about the impact of air pollution (PM2.5) on human health (as years of life expectancy loss), based on publicly-available open data and use it to inform the public about the risks of air pollution and enable policy-makers to make regulatory decisions based on evidence.
🔍 Specifically ...
- How to calculate the metric every time new data rolls in.
- How to aggregate the data into different administrative units.
- How to identify 'hotspots'.
- What to say about it and how to display it.
🧑🎓 What you will learn:
- To wear different hats in a project: data scientist, project manager, data storyteller, researcher ...
- To collaborate with new teams and across teams —everyone has something that other teams need and needs something from other teams.
- To set realistic, yet aspirational expectations.
- To set and follow best practices for technical collaborations.
- To process, analyze and visualize geospatial data.
- To present results to stakeholders.
📅 Pre-hackathon stuff:
- Read this repo in detail
- Meet to discuss and distribute roles
- Decide on the communication channels, management and documentation tools
- Set up your computer if you need to download or install any software —don't do it the day of the hackathon, please!
⚠ IMPORTANT NOTE:
The entire group will collaborate on the same repo, so read the Collaboration recommendations.
- Global annual PM2.5 grids from 1998-2019 (GeoTIFFs) in
./data/raw
- Global administrative units (simplified geojson files for admin levels 0-2 —you're welcomed🙏) in
./data/additional
Our 'raw data' is SEDAC's satellite-derived Global Annual PM2.5 Grids. SEDAC is a data center in NASA's Earth Observing System Data and Information System. The estimates in this indicator are intended to help in large-scale health and environmental studies. The gridded data sets provided have a resolution of 0.01 degrees to allow researchers to agglomerate data to meet their particular needs.
⚠ MORE DETAILS:
Here're more detailed explanations about the data.
The Air Quality Life Index (AQLI) by the Energy Institute at the University of Chicago is one of the better known and better documented models right now —which you can use as a reference in any way, shape or form you like.
⚠ PLEASE READ:
The AQLI methodology, it contains details relevant to all teams.
-
Global (GL) Annual PM2.5 Grids from MODIS, MISR and SeaWiFS Aerosol Optical Depth (AOD), v4.03 (1998 – 2019)
-
GADM, the Database of Global Administrative Areas —the highest-resolution database of country administrative areas
-
Search by country
-
PoliticalAtlas (populations for Asia and Africa mapped to admin levels using GADM)
-
Mapping the world’s population with worldpop.org.uk
-
QGIS: a free and open source mapping software https://qgis.org/en/site/
-
Mapping with Plot https://observablehq.com/@observablehq/plot-mapping
-
Mapping with GeoTIFFs Python https://towardsdatascience.com/reading-and-visualizing-geotiff-images-with-python-8dcca7a74510
-
Using Git Large File Storage
Team name: owls 🦉 |
---|
Members: Joan, Roser, Elmo, Marwa, David |
Question: How can we aggregate the gridded data into relevant administrative units?
Goal: Automate the pipeline that summarizes the annual PM2.5 values from a gridded data format (TIFFs) to administrative units (country, region, province).
Deliverables:
- Data files with the PM2.5 values aggregated by the different levels of admin units —to be negotiated with the pandas 🐼 team
- Visualizations of the input data and the output
- Documentation of the process —using the visuals generated
Team name: pandas 🐼 |
---|
Members: Josep, Nil, Ivan, Xavier, Llorenç, Denisse |
Question: How can we transform it into a measure of health impact?
Goal: Design the model (weighted by population) that converts PM2.5 pollution into years of life expectancy lost
Deliverables:
- Data files with the different levels of admin units for a country —to be negotiated with the rhinos 🦏 team
- Visualizations of the output files at different levels
- Documentation of the process —using the visuals generated
Team name: rhinos 🦏 |
---|
Members: Judit, Wilber, Samuel, Isaac, Feng |
Question: How can we help policy-makers understand about the issue?
Goal: Prototype an interactive, country-level report about the impact of PM2.5 pollution —to be used by policy-makers
Deliverables:
- Brief summary of the methodology used by the pandas 🐼 and the owls 🦉
- A criteria to determine country hotspots
- ObservableHQ notebook with the prototype that helps policy-makers simulate different pollution-reduction scenarios
The Hackathon takes place in the P-IV building, EPS UdG on February 4, 2023.
We will provide breakfast 🥐, lunch 🥪, snacks 🍌, coffee ☕ ...
- 👋 09:00 Welcome, reminder of logistics like working rooms, lunch, drinks, communication channels ...
- 🙋♀️ 09:15 Standup meeting. Objectives, processes, what you want to achieve in the hackathon and any questions.
- 👩💻 09:30 Start of work day!
- 🙋♀️ 13:15 Short standup
- 🍱 13:30 Lunch
- 👩💻 14:30 Back to work
- 🧑🏫 19:00 Wrap-up presentation < 6 slides 😜: About 10 minutes per team, in English.
- What was achieved?
- What was helpful?
- What’s left to do?
- 🏆 19:30 Awards
- 🥳 20:00 End!!!
We'll come to you, moving from group to group, and we'll be available for questions and solving blocks.
- Use folders and file names that are human-readable and let you identify the content, preferably use lower case separated by dashes. For example:
areas-of-interest-getis-ord.py
- Follow the Branch Per Feature model: one feature, one branch.
- Prepend each branch with your team name. For example if you're commiting part of your work cleaning up the data, you would push it to a
owls--data-cleaning
branch. - Use a consistent pattern for commit messages, a nice one is
type of commit: description of the commit in imperative mood
as inrefactor: use map instead of for loop
.
As we all know the professional jury and the popular vote don't always match, so we're offering two awards: you all decide one via an open vote, we decide the other —which may or may not be the same, and we won't know until we reveal them simultaneously. There will be an guest judge and the presentation must be in English.
- You must vote 3, 2, 1; you can't vote all 3s, or vote all 2s or vote one 3 for yourselves and the rest 1s ...
- Pandas, to be fair, one of you mustn't vote.
🏆 Jury fav: A €500 gift card for the team (sponsored by the Càtedra Informació i Computació via Eurecat)
🏆 Popular vote: A copy of How Charts Lie: Getting Smarter about Visual Information by Alberto Cairo, for each team member (sponsored by ViT)
- 50%: Active participation and engagement with the given roles —part of it will be our observation, part of it will be self and peer evaluation within the teams.
- 30%: Delivery of the presentation of the results.
- 20%: Creativity, feasibility and accuracy of the deliverables.
Remember that the hackathon will account for 25% of the final mark for the subject.
Self and peer evaluation forms:
- All attending students get 0.25 for actively participating.
- All students in the winning groups get 0.75 (if the popular vote coincides with the jury favorite they'll get an extra +0.5)
- We will take into account:
- How clearly the visualization displays the results,
- the strategies used to highlight patterns,
- the integration of the visuals with the documentation or the text in the prototype.
Remember that the hackathon will account for 10% of the final mark for the subject.