trino-js-client demo project

Overview

By leveraging the power of Trino, modern languages like Typescript and Starburst’s managed services, this project aims to enable Software Developers to facilitate data handling and use these integrated technologies as a dependable backend system for data driven applications. The project integrates open-source rideshare data into Starburst Galaxy, creating a web application dashboard designed to provide insights into rideshare and traffic data across major metropolitan areas. This dashboard inspired application is designed to highlight how variability—such as borough location and weather affects rideshare and traffic data points. By leveraging Trino as a powerful querying engine and integrating with Starburst Galaxy, the project showcases how modern JavaScript/TypeScript frameworks and libraries can be used to derive actionable insights from complex data sets.

Key Technologies

JavaScript and TypeScript: The project is primarily built using JavaScript and TypeScript, showcasing their effectiveness in developing modern, scalable web applications.
Node.js: Utilized for server-side logic, api handling, and query management, enabling an efficient backend infrastructure.
React: Used for building interactive and dynamic user interfaces of the dashboard, making use of its component-based architecture for a seamless user experience
trino-js-client: For connecting to the Starburst Galaxy server and executing queries within Node.js server-side logic. Provides us the power to federate data across multiple sources to power our application
Starburst Galaxy: Accessing data configured within Galaxy Catalogs across multiple locations

Datasets

The project uses the following Kaggle datasets:

Replicating This Project

Follow these steps to replicate the project on your local machine:

1. Download Data from Kaggle

Download Rideshare Data:
- Visit the NYC Rideshare Analysis Data page on Kaggle.
- Download the dataset files to your local machine.
Download Weather Data:
- Visit the NYC Weather Data page on Kaggle.
- Download the dataset files to your local machine.

2. Upload Data to Starburst Galaxy Partnered Storage

Upload to AWS S3:
- Log in to your AWS account.
- Navigate to the S3 service.
- Create a new bucket or use an existing bucket.
- Upload the downloaded datasets to the S3 bucket.
Upload to Snowflake (optional):
- Log in to your Snowflake account.
- Navigate to the "Data" section.
- Create a new database or use an existing one.
- Use the Snowflake Web Interface or SnowSQL to load the datasets into Snowflake tables.

3. Schema Discovery and Table Creation in Starburst Galaxy

Discovering Schemas:
- Log in to your Starburst Galaxy account.
- Navigate to the Data section.
- Use the schema discovery feature to detect and create schemas for your datasets in the AWS S3 or Snowflake storage.
Creating Bronze Layer Tables:
- Create bronze layer tables using the discovered schemas. These tables will hold the raw data as it was uploaded.

Creating Silver Layer Tables:

Create silver layer tables to refine and organize the data for analysis. Use the following query to create the nyc_rideshare_fare_analysis table:

CREATE TABLE nyc_rideshare_fare_analysis AS
SELECT 
    date,
    hour_of_day,
    day_of_week(date) AS day_of_week,
    CEIL(passenger_fare) AS passenger_fare,
    CEIL(driver_total_pay) AS driver_total_pay,
    passenger_fare - (driver_total_pay + rideshare_profit) AS tip_amount,
    trip_length,
    total_ride_time,
    business, 
    pickup_location,
    dropoff_location
FROM bronze_schema.rideshare_data
WHERE YEAR(date) = 2022;

4. Clone This Repository

Open your terminal or command prompt.

Run the following command to clone the repository:

git clone https://github.com/emilysunaryo/trino-js-demo.git
cd demo_galaxy_app

5. Install Dependencies

Ensure you have Node.js and npm installed on your machine. You can download them from nodejs.org.
Install the project dependencies by running:
```
npm install
```

6. Set Up Environment Variables

Create a .env file in the root directory of the project.

Add the following environment variables to the .env file, replacing the placeholder values with your actual credentials and settings:

TRINO_SERVER_URL=your-galaxy-server-url
TRINO_USER=your-username
TRINO_PASSWORD=your-password
S3_BUCKET_NAME=your-s3-bucket-name
SNOWFLAKE_ACCOUNT=your-snowflake-account
SNOWFLAKE_USER=your-snowflake-username
SNOWFLAKE_PASSWORD=your-snowflake-password

7. Run the Application

Start the application by cd'ing into the node_server directory to initialize the trino-js library:
```
node trino_client_server
```
Start the application by cd'ing into the demo_react_frontend directory running:
```
npm start
```
Open your web browser and navigate to the local server address provided in the terminal to view the application.

Example Queries

Here are some example queries used in the project to fetch and analyze data:

const trino = require('trino-js-client');

// Trino Client connected on Galaxy Server
const auth = new BasicAuth(process.env.GALAXY_USERNAME, process.env.GALAXY_PASSWORD);
const sslOptions = {
  rejectUnauthorized: true
};

const trino =  Trino.create({
  server: INCLUDE YOUR OWN GALAXY SERVER HERE,
  catalog: 'nyc_uber_rides',
  schema: '',
  auth: auth,
  ssl: sslOptions
});

Provided Test Query used in Project

const testQuery = 'SELECT
    d.borough as borough,
    round(avg(driver_total_pay), 2) as avg_driver_pay,
    count(*) as ride_requests
FROM
   silver_schema.nyc_rideshare_fare_analysis r
JOIN
    taxi_zone_lookup.taxi_zones.zone_lookup d ON r.dropoff_location = d.location_id
WHERE 
    d.borough != 'Unknown'
GROUP BY
     d.borough
ORDER BY
    ride_requests'

Query functions to access data through Trino-js Client

const executeQuery = async (query) => {
  const iter = await trino.query(query);
  const data = await iter
      .map(results => results.data ?? [])
      .fold([], (row, acc) => [...acc, ...row]);
  console.log("testing return of query on server:", data )
  return data;
}

executeQuery(testQuery);

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
demo_react_frontend		demo_react_frontend
node_modules		node_modules
node_server		node_server
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
test.session.sql		test.session.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trino-js-client demo project

Overview

Key Technologies

Datasets

Replicating This Project

1. Download Data from Kaggle

2. Upload Data to Starburst Galaxy Partnered Storage

3. Schema Discovery and Table Creation in Starburst Galaxy

4. Clone This Repository

5. Install Dependencies

6. Set Up Environment Variables

7. Run the Application

Example Queries

About

Releases

Packages

Languages

emilysunaryo/trino-js-demo

Folders and files

Latest commit

History

Repository files navigation

trino-js-client demo project

Overview

Key Technologies

Datasets

Replicating This Project

1. Download Data from Kaggle

2. Upload Data to Starburst Galaxy Partnered Storage

3. Schema Discovery and Table Creation in Starburst Galaxy

4. Clone This Repository

5. Install Dependencies

6. Set Up Environment Variables

7. Run the Application

Example Queries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages