Skip to content

Project using Compact Prediction Tree Algorithm. Based on the paper "Compact Prediction Tree : A lossless model for accurate sequence prediction"

Notifications You must be signed in to change notification settings

terman37/CompactPredictionTree_Sequence-Prediction

Repository files navigation

Compact Prediction Tree - Sequence Prediction

An example of sequence prediction based on the CPT algorithm

Docs available here and here

Main part of the algorithm is based on CPT implementation from here https://github.com/NeerajSarwan/CPT

Tweaked for my own needs.

DataSet:

  • Consisting of over 44k sequences of different lengths (from 2 items to 10) reprensenting over129k items
  • there exists 370 differents CODES
  • train.csv file desciption
    • ID: gouping id for sequence
    • CODE: sequence item
    • LINE_NB: position of item in sequence

Target:

Predict the last item in sequence.

Training:

Training the model consist of building the Tree, Inverted index and Lookup Table

  • Tree: hierachical tree modeling the sequences

    tree_sample
  • Inverted Index: dictionnary giving in which sequence each code is used

    here CODE: 'PX9' is used in sequences 8,1,4,5

    II
  • Lookup Table: dictionnary giving node adress of last element of a sequence:

    LT

Predictions:

Concept, For a given sequence:

  • find all sequences containing any its item using Inverted Index

  • Rebuild the original sequences using the Lookup Table (avoiding to save the original data)

  • then for each original similar sequence:

    • find position corresponding to the last item in the sequence to predict
    • calculate a score for each possible following item (check docs for global description)
  • return the n elements with biggest score

API:

Simple API using FastAPI and can be run Docker

postman

Next steps:

Due to nature of data, we can see that only the 2 or 3 preceding items are important. Thus, I decided to switch to a more classical approach using Decision tree / Random Forest calssifiers. it also permits more flexible approach allowing additional input feature to improve model performance.

About

Project using Compact Prediction Tree Algorithm. Based on the paper "Compact Prediction Tree : A lossless model for accurate sequence prediction"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published