Skip to content

This project aims to optimize the extraction, transformation, and analysis of transactional data for a rapidly growing e-commerce platform. It involves building data pipeline capable of handling large volumes of transactional information, ensuring the data is accurate, structured for insightful analysis.

Notifications You must be signed in to change notification settings

fajri-yanti/dwh-amazon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

amazon-datawarehouse

Objective

This project aims to optimize the extraction, transformation, and analysis of transactional data for a rapidly growing e-commerce platform. It involves building data pipeline capable of handling large volumes of transactional information, ensuring the data is accurate, structured for insightful analysis.

Tools

  • Cloud Storage: Data will be stored and managed in Google Cloud Storage, which will act as a centralized repository for raw data before being processed.
  • BigQuery: Data will be loaded into BigQuery for further analysis and querying.
  • DBT: The transformation layer will be handled by DBT, ensuring that all necessary data transformations (such as cleaning, aggregating, and joining) are performed efficiently and accurately

Data Source

Amazon Sales Data

ERD

img

Dimensional Table

  • dim_fulfillment

    img

  • dim_channel

    img

  • dim_ship

    img

  • dim_order

    img

  • dim_product

    img

  • dim_promotion

    img

Fact Table

  • fact_salesorder

    img

notes: got some issue while upload amazon-datawarehouse folder, i upload it on other github repository
Amazon Datawarehouse Folder

Top 10 Best Selling Product


img

SQL Query Top 10 Best Selling Product
SELECT 
    o.sku AS product_id, 
    p.category,
    SUM(o.qty) AS qty_product 
    
FROM 
    `amazon-datawarehouse.amazon_datawarehouse.dim_order` o
LEFT JOIN 
    `amazon-datawarehouse.amazon_datawarehouse.dim_product` p
ON 
    o.sku = p.sku
GROUP BY 
    o.sku, p.category
ORDER BY 
    qty_product DESC
LIMIT 10;

About

This project aims to optimize the extraction, transformation, and analysis of transactional data for a rapidly growing e-commerce platform. It involves building data pipeline capable of handling large volumes of transactional information, ensuring the data is accurate, structured for insightful analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published