Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaner Tokenization Handling in tidybert::tidy_bert_output() #18

Open
jonthegeek opened this issue Nov 7, 2022 · 0 comments
Open

Cleaner Tokenization Handling in tidybert::tidy_bert_output() #18

jonthegeek opened this issue Nov 7, 2022 · 0 comments

Comments

@jonthegeek
Copy link
Member

The basic_usage.Rmd vignette still has to do a manual tokenize step. That shouldn’t be necessary.

Right now we only auto-tokenize with {luz} fit/predict (and the callback telling it to do that). We need a clean way to tokenize when we use pretrained berts more directly, like we do here.

If that worked, we could create an (untokenized) dataset, then use it in the model, at which point it would be updated to match the model (or we call a helper or whatever). And then tidy_bert_output() could accept a dataset_bert_pretrained for its second argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant