textract-statement-processor

A sample pipeline that takes as input bank statements, extracts transaction information from tables within the statements using Textract, stores, and classifies each transaction.

PDF bank statements that have been scanned, or downloaded from an online banking application, are uploaded to the Landing bucket in S3
The landing of the file in the S3 bucket triggers a Lambda function that starts the step function
The Lambda function starts the step function execution
The first step in the step function calls a Lambda to start a new Textract document analysis job
A new document analysis job is invoked with the uploaded PDF
The step function periodically calls a Lambda to get the job results
The Lambda checks with Textract, using the job identifier, whether the analysis job is complete
When the analysis job is complete the Lambda takes the output of the job, extracts the tabular data, and processes the transaction records into a JSON file which it then saves in the Processed bucket in S3.
An API Lambda queries the JSON files stored in the S3 bucket in response to a request from the API gateway. An additional classification step at this point classifies each transaction into a type and sub-type based on user configurable classification rules.
The API Gateway serves a RESTful API that a we frontend consumes to visualise transaction data
Finally, the visualisation output of multiple years worth of classified transaction data is visualised within a Sankey diagram as shown below, allowing users to see at a glance income vs expenditure.
ML models can be trained and run aginst historical transaction data

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
frontend		frontend
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
Statement20230428.pdf		Statement20230428.pdf
classification.csv		classification.csv
cloudFormation.yml		cloudFormation.yml
pipeline_architecture.png		pipeline_architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

textract-statement-processor

About

Releases

Packages

Contributors 2

Languages

License

aws-samples/textract-bank-statement-processor

Folders and files

Latest commit

History

Repository files navigation

textract-statement-processor

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages