Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic tagging of text #17

Open
Basant1861 opened this issue Jun 5, 2018 · 1 comment
Open

Automatic tagging of text #17

Basant1861 opened this issue Jun 5, 2018 · 1 comment
Assignees

Comments

@Basant1861
Copy link
Contributor

Basant1861 commented Jun 5, 2018

We want to tag text automatically using known available algorithms.

The full text of the PDF is availabe in the akn_ft/ collection the metadata of the PDF documents is in the akn/ collection.

We need to have a service that accepts document text (for an IRI) from the metadata and fulltext collections and returns a weighted set of probable tags for the text.

There will need to be some level of weightage given to the source of the provided text. e.g. a Tag that occurs in the document titlle should have a higher weightage than text appearing in the body of the doucment.

THese probable tags should then be saved back on the metadata document in the akn/ collection.

So to implement the service:

  1. get the data first - xml to text of the document in /akn_ft. xml to text of the document in /akn. XML to text of the document in /akn will not be effective because a lot of the data is in attributes, so a custom XQuery to selectively pick text data should be written to send text to the service. To prototype / test the service you just need to use /akn_ft since the data is already in text form there. once the idea is validated, the extractor for /akn metadata can be added.

  2. implement the service - as a node js backend which accepts the text or tagged text and returns probabl tags.

  3. service response acceptor - receives the generated tags.

  4. UI - needs to be implemented within a panel on gawati-editor-ui (this can be done later after the service is implemented)

@Basant1861 Basant1861 self-assigned this Jun 5, 2018
@kohsah kohsah self-assigned this Jun 11, 2018
@kohsah
Copy link
Contributor

kohsah commented Jun 11, 2018

@Basant1861 this needs to be implemented as a service in gawati-editor-fe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants