You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to tag text automatically using known available algorithms.
The full text of the PDF is availabe in the akn_ft/ collection the metadata of the PDF documents is in the akn/ collection.
We need to have a service that accepts document text (for an IRI) from the metadata and fulltext collections and returns a weighted set of probable tags for the text.
There will need to be some level of weightage given to the source of the provided text. e.g. a Tag that occurs in the document titlle should have a higher weightage than text appearing in the body of the doucment.
THese probable tags should then be saved back on the metadata document in the akn/ collection.
So to implement the service:
get the data first - xml to text of the document in /akn_ft. xml to text of the document in /akn. XML to text of the document in /akn will not be effective because a lot of the data is in attributes, so a custom XQuery to selectively pick text data should be written to send text to the service. To prototype / test the service you just need to use /akn_ft since the data is already in text form there. once the idea is validated, the extractor for /akn metadata can be added.
implement the service - as a node js backend which accepts the text or tagged text and returns probabl tags.
service response acceptor - receives the generated tags.
UI - needs to be implemented within a panel on gawati-editor-ui (this can be done later after the service is implemented)
The text was updated successfully, but these errors were encountered:
We want to tag text automatically using known available algorithms.
The full text of the PDF is availabe in the akn_ft/ collection the metadata of the PDF documents is in the akn/ collection.
We need to have a service that accepts document text (for an IRI) from the metadata and fulltext collections and returns a weighted set of probable tags for the text.
There will need to be some level of weightage given to the source of the provided text. e.g. a Tag that occurs in the document titlle should have a higher weightage than text appearing in the body of the doucment.
THese probable tags should then be saved back on the metadata document in the akn/ collection.
So to implement the service:
get the data first - xml to text of the document in /akn_ft. xml to text of the document in /akn. XML to text of the document in /akn will not be effective because a lot of the data is in attributes, so a custom XQuery to selectively pick text data should be written to send text to the service. To prototype / test the service you just need to use /akn_ft since the data is already in text form there. once the idea is validated, the extractor for /akn metadata can be added.
implement the service - as a node js backend which accepts the text or tagged text and returns probabl tags.
service response acceptor - receives the generated tags.
UI - needs to be implemented within a panel on gawati-editor-ui (this can be done later after the service is implemented)
The text was updated successfully, but these errors were encountered: