Data Science Case Study

How to run

Build a docker image:

docker build -f Dockerfile -t mymodel .

Run the docker image:

docker run -ti mymodel /bin/bash

Build the dataset:

python make_dataset.py

Preprocess the data:

python preprocess.py

Train the model:

python train.py

Run a flask server:

python api.py &

Make predictions:

python request.py

Predictions are saved to /src/data/predicted.csv.

Acme Corporation has operations in several cities and countries, delivering thousands of orders every day. In order to deliver these orders on time we depend on good estimations of how much time the shopper needs to complete the order.

You will be creating a machine learning model to make these estimations. As we internally build our machine learning solutions using python, we ask you to do the same. However you are free to use the libraries you are most comfortable with.

Data

In this repository, we have included data representing the order, shopper and the store branch.

File description and data fields

order_products.csv:

order_id: ID of the order
product_id: ID of the product
quantity: The quantity ordered of this product
buy_unit: The unit of the product (KG/UN)

orders.csv:

order_id: ID of the order
lat: The latitude of the delivery location
lng: The longitude of the delivery location
promised_time: The delivery time promised to the user
on_demand: If true, the order was promised to be delivered in less than X minutes
shopper_id: ID representing the shopper completed the order.
store_branch_id: ID of the store branch
total_minutes: The total minutes it took to complete the order (label)

shopper.csv

shopper_id: ID of the shopper
seniority: The experience level of the shopper.
found_rate: Percentage of products found by shopper historical.
picking_speed: Historical picking speed, products pr minutes.
accepted_rate: Percentage of orders historically accepted by shopper
rating: client rating of shopper

storebranch.csv:

store_branch_id: ID of the store branch
store: ID representing the store
lat: Latitude of the branch location
lng: Longitude of the branch location

All the data has been anonymized

Objective

The objective is to predict the total_minutes a order takes to complete, where the rows not containing a total_minutes value should be set aside as a part of the submission file, containing the order_id with the predicted values.

As we are interested in seeing how you attacked the problem, we also ask you to include your code together with the submission file. The code needs to be well documented, explaining the decisions made. With these explanations, we will be looking at everything from how the data was processed, features used to the completed model and predictions.

Good luck!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
model		model
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
case-ml.ipynb		case-ml.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Case Study

How to run

Data

File description and data fields

Objective

About

Releases

Packages

Languages

rodferro/case-ml

Folders and files

Latest commit

History

Repository files navigation

Data Science Case Study

How to run

Data

File description and data fields

Objective

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages