- Flask
- Scikit-Learn
- Pandas
- NumPy
- Imbalanced-learn
- Kafka
- Cassandra (haven't uploaded that part)
- backend.py does actual inference and returns prediction over kafka topic.
- flaskapi.py is a forefront for accepting api calls over POST and sending it to backend via kafka topics.
- testapi.py mimics a demo api call using requests library.
- This repo focuses more on scalability, thus it uses kafka for streaming.
- A user sends POST request to flaskapi endpoint containing transaction information and a unique id for request identification.
- flaskapi receives inputdata and then sends it to a kafka topic (fraudsender) and waits for response by backend on kafka topic (fraudreceiver).
- backend receives data over fraudsender topic and runs model inference on transaction information. After inference, it sends prediction and id over fraudreceiver topic.
- After receiving prediction msg over fraudreceiver topic, flaskapi responds user with prediction.
- Model uses creditcard dataset from kaggle.
- I used SMOTE as this dataset is highly imbalanced.
- Used Randomforest (with standardscaler in pipeline) on first 20000 rows with accuracy of 99%.
- I haven't included model training script, contact me over email if you need.
- Containerizing backend in docker.
PS- Will make a tutorial blog soon..