Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parts of speech cleaning #60

Open
jpcompartir opened this issue Nov 20, 2023 · 2 comments
Open

Parts of speech cleaning #60

jpcompartir opened this issue Nov 20, 2023 · 2 comments

Comments

@jpcompartir
Copy link
Owner

Add to LimpiaR to form part of pre-processing - plus dances nicer with the dependencies as LimpiaR is v lightweight still

@jpcompartir
Copy link
Owner Author

Heavily using {udpipe}

First wave:

  • load model
  • annotate data frame in pipe-able way
  • re-create sentences/documents from tokens/lemma etc.
  • parallel processing via udpipe built in
  • progress updates via udpipe built in

Second wave:

  • integration with other company packages
  • visualisations using output (probably sit in a different package for deps.)
  • abstractions for the most common steps e.g. swap only verbs for their lemma and repair sentence/document + join back to original data
  • noun-phrase extraction + 'consolidation' e.g. similar to spacyr

@jpcompartir
Copy link
Owner Author

Underway in #63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant