The associated blog post can be found here.
This project consists of a model to predict the population of a given area, solely based on features extracted from OSM (OpenStreetMap) data. This model could have several uses in urban planning or traffic modelling for example, and as the OSM data is open source and constantly being updated, it's a free and accessible data for anyone to easily use to make estimates.
For this end a dataset was created, by taking a subset of the data gathered from reference [1], and augmenting it with more detailed OSM features with the help of osm-feature-extractor.
The included data consists of ~30k equally sized hexagons which span across the area of Great Britain (England, Wales and Scotland). The data contains information regarding the population of each area, in turn derived from Facebook's High Resolution Settlement Layer, which estimates the population from satellite imagery. Apart from the population, the data has features taken from OSM extracts, such as the number and area of buildings, the length of each type of road, the number of all kinds of shops (restaurants, groceries, etc) or the number of public transportation in the area. For a more detailed take on which features were used refer to this document.
After running the model, one can use osm-feature-extractor to generate user-defined areas on which to estimate the population on. The referenced project has instructions on how to achieve that.
The main results of the model, using a Lasso regressor are:
R2 score | Mean absolute error (inhabitants / km2) |
---|---|
88.9% | 98.8 |
The full results of the model are presented in the section Results below.
In order to run the model, do the following steps:
- Create a virtual environment using
conda
:
$ conda env create --file environment.yml
- Download the dataset files:
$ python download.py
- Run the main script that pre-processes the data, trains the model and saves it.
$ python main.py
You can adjust the project config variables in proj.conf.
input_data_file: Name of file with training data
out_file: Name of file to save model on
One can also adjust specific model parameters in settings.py.
- settings.py - file with configuration parameters
- basic_features.py - notebook with workflow with basic OSM features
- all_features.py - notebook with workflow with extended OSM features
- main.py - main python script that wraps all pipeline steps
- process_data.py - processes the data before being fed to the model
- train_model.py - contains the logic where the data is fitted into the model
- model_evaluation.py - contains the logic for evaluating and showing the results of the model
- pipeline_classes.py - contains classes that are used in the machine learning pipeline
- helper_methods.py - contains helper methods used in the pipeline
- hexagons_basic_features_sample.geojson - dataset of hexagons with basic features (sample data)
- hexagons_all_features_sample.geojson - dataset of hexagons with extended features (sample data)
The main libraries used in this application are:
Population estimates vs actual | Coefficients of the model |
---|---|
[1] - Kontur Population: Global Population Density for 400m H3 Hexagons
[2] - Bast, Hannah, 2015. Fine-Grained Population Estimation