Automatic tagging of text #17

Basant1861 · 2018-06-05T06:56:26Z

We want to tag text automatically using known available algorithms.

The full text of the PDF is availabe in the akn_ft/ collection the metadata of the PDF documents is in the akn/ collection.

We need to have a service that accepts document text (for an IRI) from the metadata and fulltext collections and returns a weighted set of probable tags for the text.

There will need to be some level of weightage given to the source of the provided text. e.g. a Tag that occurs in the document titlle should have a higher weightage than text appearing in the body of the doucment.

THese probable tags should then be saved back on the metadata document in the akn/ collection.

So to implement the service:

get the data first - xml to text of the document in /akn_ft. xml to text of the document in /akn. XML to text of the document in /akn will not be effective because a lot of the data is in attributes, so a custom XQuery to selectively pick text data should be written to send text to the service. To prototype / test the service you just need to use /akn_ft since the data is already in text form there. once the idea is validated, the extractor for /akn metadata can be added.
implement the service - as a node js backend which accepts the text or tagged text and returns probabl tags.
service response acceptor - receives the generated tags.
UI - needs to be implemented within a panel on gawati-editor-ui (this can be done later after the service is implemented)

kohsah · 2018-06-11T08:28:09Z

@Basant1861 this needs to be implemented as a service in gawati-editor-fe

Basant1861 self-assigned this Jun 5, 2018

kohsah self-assigned this Jun 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic tagging of text #17

Automatic tagging of text #17

Basant1861 commented Jun 5, 2018 •

edited by kohsah

Loading

kohsah commented Jun 11, 2018

Automatic tagging of text #17

Automatic tagging of text #17

Comments

Basant1861 commented Jun 5, 2018 • edited by kohsah Loading

kohsah commented Jun 11, 2018

Basant1861 commented Jun 5, 2018 •

edited by kohsah

Loading