In this use case, we have developed a sample data pipeline (Glue Job) using the AWS typescript SDK, which will read the data from a dynamo DB table, perform some data transformation using PySpark and write it into an S3 bucket in CSV format.
DynamoDB is a fully managed NoSQL database service offered by AWS, which is easily scalable and used in multiple applications. On the other hand, S3 is a general-purpose storage offering by AWS.
The cdk.json
file tells the CDK Toolkit how to execute your app.
npm run build
compile typescript to jsnpm run watch
watch for changes and compilenpm run test
perform the jest unit testscdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk synth
emits the synthesized CloudFormation template
Prerequisited - you should have an AWS account (free tier is enough) and AWS CLI should have already configured (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
- Clone the repository
- Bootstap your AWS environment using - CDK Bootstrap
- Deploy the stack using - CDK Deploy
- Create dummy data in dynamoDB using the sample data
- Run the Glue job from AWS console
The Glue job can be configured from the stack