Citibike Data Integration and ETL Pipeline

This repository contains SQL scripts and configurations for a data integration and ETL pipeline that loads Citibike trip data from an Amazon S3 bucket into a Snowflake database. This pipeline uses Snowflake's external storage integration and staging capabilities to access, format, and transform Citibike data for analysis.

Features

AWS S3 and Snowflake Integration: Securely connects to a designated S3 bucket using Snowflake's STORAGE INTEGRATION and IAM roles, ensuring a reliable and efficient data connection.
Data Staging and File Formatting: Defines a reusable file format for consistent parsing of CSV files, including error handling for null values and missing data.
Automated Data Loading: Copies raw CSV data from S3 into a Snowflake stage, transforming it for storage in a structured Snowflake table.
Scalable ETL Process: Easily scalable for additional Citibike datasets or other data formats, supporting future data ingestion and transformations.

Table Structure

The pipeline loads data into a trips table, which includes columns for trip duration, start and end times, station details, bike IDs, and user demographic information. This structured format enables quick querying and supports further analytics.

Usage

Configure Storage Integration: Ensure Snowflake has permissions to access the S3 bucket by configuring the STORAGE_AWS_ROLE_ARN with appropriate IAM roles.
Run SQL Scripts: Execute the SQL scripts in the repository to create the integration, file format, stage, and table, then load the data from S3.
Query the Data: Run queries on the trips table to analyze Citibike trip data.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
snow.sql		snow.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Citibike Data Integration and ETL Pipeline

Features

Table Structure

Usage

About

Releases

Packages

Undisputed-jay/AWS-S3-Integration-with-snowflake

Folders and files

Latest commit

History

Repository files navigation

Citibike Data Integration and ETL Pipeline

Features

Table Structure

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages