Compact Prediction Tree - Sequence Prediction

An example of sequence prediction based on the CPT algorithm

Docs available here and here

Main part of the algorithm is based on CPT implementation from here https://github.com/NeerajSarwan/CPT

Tweaked for my own needs.

DataSet:

Consisting of over 44k sequences of different lengths (from 2 items to 10) reprensenting over129k items
there exists 370 differents CODES
train.csv file desciption
- ID: gouping id for sequence
- CODE: sequence item
- LINE_NB: position of item in sequence

Target:

Predict the last item in sequence.

Training:

Training the model consist of building the Tree, Inverted index and Lookup Table

Tree: hierachical tree modeling the sequences
Inverted Index: dictionnary giving in which sequence each code is used

here CODE: 'PX9' is used in sequences 8,1,4,5
Lookup Table: dictionnary giving node adress of last element of a sequence:

Predictions:

Concept, For a given sequence:

find all sequences containing any its item using Inverted Index
Rebuild the original sequences using the Lookup Table (avoiding to save the original data)
then for each original similar sequence:
- find position corresponding to the last item in the sequence to predict
- calculate a score for each possible following item (check docs for global description)
return the n elements with biggest score

API:

Simple API using FastAPI and can be run Docker

Next steps:

Due to nature of data, we can see that only the 2 or 3 preceding items are important. Thus, I decided to switch to a more classical approach using Decision tree / Random Forest calssifiers. it also permits more flexible approach allowing additional input feature to improve model performance.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
CPT_docs		CPT_docs
data		data
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
CPT.py		CPT.py
CPT_train_and_predict.ipynb		CPT_train_and_predict.ipynb
Dockerfile		Dockerfile
README.md		README.md
model.pkl		model.pkl
predict.py		predict.py
requirements.txt		requirements.txt
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compact Prediction Tree - Sequence Prediction

DataSet:

Target:

Training:

Predictions:

API:

Next steps:

About

Releases

Packages

Languages

terman37/CompactPredictionTree_Sequence-Prediction

Folders and files

Latest commit

History

Repository files navigation

Compact Prediction Tree - Sequence Prediction

DataSet:

Target:

Training:

Predictions:

API:

Next steps:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages