“Since the UK was one of the main countries that colonised the USA, and the UK is on the east side of the USA there are more towns/cities with UK names on the east coast of the US rather than the west coast”
https://github.com/apache/commons-csv/raw/master/src/test/resources/perf/worldcitiespop.txt.gz
I have tried to analyze this hypothesis, please find the complete analysis in this notebook
Please download the notebook file and view it using jupyter notebook on your system so that everything would work perfectly if you need to run each cell . The reason being, there is a cool infographic GIF which doesn't run on github directly.
Alternatively if you don't have Jupyter or ipython
- you can download the HTML version and view it on any browser. ( No guarantees on IE 😝 )
- Hosted on the web with binder. Click here 👉 (no installation required)
- Implement the scatter plot over a map of US using Basemap or Bokeh (+Google Maps)
- Consider more ways of name matching between city names