Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it "train" or "update"? #1

Open
matt2000 opened this issue Aug 29, 2018 · 5 comments
Open

Is it "train" or "update"? #1

matt2000 opened this issue Aug 29, 2018 · 5 comments
Assignees

Comments

@matt2000
Copy link
Contributor

GensimSimilarity implements both. StatefulModel requires train(). LocalTrainer tries to call model.update().

Background: (Back before transform and itransform we used train() for batches of records and update() for single records. Now we expect train to receive a batch and let OnlineLearningWrapper.transform() handle single records.)
So,

  1. Is there any difference any more?
  2. Do we need change LocalTrainer to call train instead?
  3. Do we need to add update() back into stateful model or do we need to clean-up GensimSimilarity?
@jmsteinw
Copy link

I didn't write this GensimSimilarity class (Misha did) so he could better answer this. My general thought is that update (in general) for online learning models is going to take a batch and not a single item. Since this is a more specific class meant to update a documents corpus, perhaps it is okay for it to receive an iterable.

@amnezzia
Copy link

GensimSimilarity has update method as an addition outside of a typical use, and it is not used with online learning wrapper.

Intended usage:

  1. Use train to create a new index of vectorized content available now. Train always creates new index from scratch.

  2. Use update some time later when new batch of content is needed to be added to an existing index. Alternative to that would be to take all previous and new content and pass together into train to create a new unified index.

So, this update is an exception to the rule, that was needed when I was building something with similarity index.

@matt2000
Copy link
Contributor Author

So it seems like LocalTrainer is wrong and should call model.train() and GensimSimularity.update() should be renamed to .train() and the current GensimSimularity.train should be renamed to something else, maybe initialize()? This would fix everything to match the interface of StatefulModel. Do you think this is the correct solution?

@amnezzia
Copy link

I don't see what does not match the current interface. Nothing is wrong, everything is right.

In general, train always trains from scratch, update assumes there is already an existing model, trained or just initialized, and it will just make an update to that model.

Having model training procedures in something called initialize I think is inappropriate and will not be understood.

LocalTrainer, or any other Trainer subclass in the online_learning file, was intended to be used only with online learning, and only as a component of OnlineLearningWrapper.

Not every model is suitable for online learning. In the current interface a person who implements the model need to make sure it would make sense to do updates, and implement how to do that inside model.update method.

For example the same GensimSimilarity is not suitable for online learning, it just happened to have a method that is called the same, I even probably wrote that before making the online learning functionality. If that causes too much confusion I can rename that into add_docs.

The other problem is that we don't actually have an online learning example anywhere and the online learning module was not used anywhere after I wrote trending recommender using that long time ago, which was then again promptly forgotten.

Also I would likely refactor the online learning a little differently now if I was working on it.

@matt2000
Copy link
Contributor Author

matt2000 commented Sep 7, 2018

We don't define any Model Class that has an update method. Maybe we should define an ABC like:

class UpdatableModel(StatefulModel, metaclass=ABCMeta):
    @abstractmethod
    def update(self, data=None):
        pass

and make this the required type for LocalTrainer at least?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants