Skip to content
Aaron Kyle Dennis edited this page Jan 15, 2015 · 20 revisions

Welcome to the soc-maps wiki!

Toward a Coherent System of Managing and Publishing Geospatial Information

Managing Spatial Data Repositories -- Initial Approach and Current Standing

CCCS maintains separate geospatial data repositories for our company (comprised entirely of publicly-available data) and for our clients (which typically include a mixture of of both public and proprietary data sources).

CCCS' public data repository is currently managed via GitHub as /cccs/soc-maps/. This repository was established with the initial objective of being a place for developing a web application to showcase our capacities in spatial analysis and the production of cartographic data visualization in support of social impact assessment. The goal was for this web application to be presented through our Django web framework, hosted at crossculturalconsult.com.

CCCS' early development work utilized GitHub to version-control both the web mapping software application as well as the spatial data that we intended to referenced by that application. While this approach made it convenient for sharing both the data and the code base with our various team members, both CCCS' IT expert, Paul Whipp, and our GIS consultants, Kartoza Pty., recommended against this 'combined' approach. The Git version-control system is non-optimal for management of spatial data primarily because repository size for such data tends to grow too large for Git to be effective [e.g., one tends to start encountering timeout errors when cloning and pulling larger repositories, and the system may not be able to manage any single file over 2 gigabytes in size (such as a a dump of a spatial database)].

Following the advice of our consultants, CCCS adopted an alternative system for sharing spatial data using btsync. While this approach has the advantage of allowing us to share larger-sized repositories of spatial data outside of Git, it has the disadvantage of creating parallel data sources (i.e., the shared btsync repository on the client side and the sever-side 'production' repository. Another disadvantage of the btsync approach is that data are not version controlled. We therefore view the btsync data-management approach as a 'temporary' solution.

With regard to the creation and management of geospatial data in database format, Kartoza Pty has scripted database imports for CCCS' public repository as well as one of our client project repositories. In its initially formulation, this import script loaded data into a postgreSQL data via a Docker container [a software application that CCCS has requested be eliminated from our software application stack]. CCCS has since dumped these databases from within Docker and imported them directly to our server's postgreSQL data--renaming them in the process to:

These copies of these database dumps are available via their respective repositories (linked above).

PLEASE NOTE: The docker-oriented shapefile data import scipt

Managing Spatial Data Repositories -- Overcoming Challenges

CCCS and Kartoza Pty continue to explore options for spatial data management and map production workflow.

Spatial Data Management

We have tentatively settled on using the 'GeoGig' platform for distributed version-control of shapefile data. The GeoGig software allows users to import raw geospatial data (currently from Shapefiles, PostGIS or SpatiaLite).

Kartoza has contributed some to investigating use of the GeoGig application, though CCCS has yet to receive a briefing about the particular details of this work. Kartoza invoices to CCCS suggest that most of the work relates to embedding GeoGig into a Docker image; CCCS has requested Kartoza review of work charged to CCCS on this matter on the grounds that our request to stop all Docker-related development was prior to this work being conducted.

With regard to moving forward with 'GeoGig':

CCCS needs to create GeoGig repositories for 'public' data and for each of our our 'client' projects. The GeoGig repositories must have their 'MASTER' branch hosted on CCCS' servers.

Progress in this regard is as follows:

  1. We have installed GeoGig to our data server:
    geogig@ip-10-167-186-14:/home/aaron$ geogig version Project Version : 1.0-beta1 Build Time : August 14, 2014 at 17:44:46 ART Build User Name : Gabriel Roldan Build User Email : gabriel.roldan@gmail.com Git Branch : r1.0-beta1 Git Commit ID : 9aae709f4f451802a09c14293c92a46372c868bd Git Commit Time : August 14, 2014 at 17:43:33 ART Git Commit Author Name : Gabriel Roldan Git Commit Author Email : gabriel.roldan@gmail.comG Git Commit Message : Set version to 1.0-beta1
  2. We have enabled the user 'geogig' to call the geogig application by adding the PATH variable to the user's /bashrc file/
  3. We created a DNS entry to link traffic coming in from http://geogig.crossculturalconsult.com to our desired geogig server

Remaining tasks to get the GeoGig set-up working are:

  1. We need to configure nginx appropriately to allow us to push and pull data to each GeoGig project
  2. We need to import existing data into GeoGig. While the GeoGig documentation suggests that it is possible to import postgreSQL data as well as shapefiles, it remains unclear to CCCS how such imports are to occur. That is, what happens if we import both a postgreSQL data base as well as shapefiles? Does GeoGig automatically re-structure the data and allow us to re-export these files as a single postgreSQL database, or does it keep each file entity separate? Would the GeoGig system act as the database, or is it used only to version control changes to a database? Does GeoGig also allow us to manage regular data tables, such as socio-economic and cenus data? We'd like some coaching and tutorials on how data management should occur with GeoGig as our version-control system.