Skip to content

cwdavies/ccdb-data-pipeline

 
 

Repository files navigation

CCDB Data Pipeline

A lightweight ETL data pipeline intended to support the operations of the Consumer Complaint Search application.

Description: This purpose of this code is to provide data for Consumer Complaint Search. This pipeline downloads scrubbed consumer complaint data and indexes that data in Elasticsearch for the Complaint Search application to display and analyze.

Status: In Production

Dependencies

This pipeline is intended to index data in Elasticsearch and is dependent on having an Elasticsearch instance to interface with.

Installation

Detailed instructions on how to install, configure, and get the project running are in the INSTALL document.

Usage (Users)

  1. Set environment variables
    1. export ES_USERNAME=<foo>
    2. export ES_PASSWORD=<bar>
    3. export ENV=[ENVIRONMENT]
      1. where ENVIRONMENT=dev, staging, prod
  2. make from_public

Usage (Developers)

  1. source ./activate-virtualenv.sh
  2. Set environment variables
    1. export AWS_ACCESS_KEY_ID=<svc_account_access_key>
    2. export AWS_SECRET_ACCESS_KEY=<svc_account_secret_access_key>
    3. export ES_USERNAME=<foo>
    4. export ES_PASSWORD=<bar>
    5. export ENV=[ENVIRONMENT]
      1. where ENVIRONMENT=dev, staging, prod
    6. export INPUT_S3_BUCKET=<bucket-name>
    7. export INPUT_S3_KEY=<path-to-csv>
    8. export OUTPUT_S3_BUCKET=<bucket-name>
    9. export OUTPUT_S3_FOLDER=ccdb/test/<your initials>
  3. make

Open source licensing info

  1. TERMS
  2. LICENSE
  3. CFPB Source Code Policy

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.3%
  • Makefile 6.0%
  • Shell 1.5%
  • JavaScript 0.2%