-
Notifications
You must be signed in to change notification settings - Fork 36
WN Package
OpenWordnets for Portuguese and English are published, besides RDF format, in LMF, which is a format compatible with WN package. WN is a python package developed as an interface for wordnet data, containing useful features such as queries, path finding and similarity.
OWNs for Portuguese and English may be downloaded in LMF format in the releases page https://github.com/own-pt/openWordnet-PT/releases. One is allowed to download only own-pt (for Portuguese), own-en (for English) or own (both data) data packaged. Follow, for instance:
>>> import wn
>>> wn.add("own-en.tar.gz")
Added own-en:1.0.0 (OpenWordnet-EN)
Besides, it is possible to download an index file, in index.toml
and use it for downloading a lexicon from command line:
$ python -m wn download --index index.toml "own-en"
Download [##############################] (11766130/11766130 bytes) Complete
Added own-en:1.0.0 (OpenWordnet-EN)
In the future, the data may be indexed as part of the standard library.
For the following examples, consider we have downloaded both data, and already imported WN:
>>> import wn
WN supports methods for retrieving Words, Wordsenses and Synsets:
>>> wn.words("dog")
[Word('own-en-word-dog-n'), Word('own-en-word-dog-v')]
>>> wn.senses("dog")
[Sense('own-en-wordsense-02084071-n-2'), Sense('own-en-wordsense-02710044-n-2'), Sense('own-en-wordsense-03901548-n-3'), ...]
>>> wn.synsets("dog")
[Synset('own-en-synset-02084071-n'), Synset('own-en-synset-02710044-n'), Synset('own-en-synset-03901548-n'), ...]
As standard, when working with more than one lexicon, consider the queries run over the complete data:
>>> wn.words("Costume") # returns for english and portuguese
[Word('own-en-word-costume-n'), Word('own-en-word-costume-v'), Word('own-pt-word-costume-n')]
In that cases, one may specify lexicon, language or part-of-speech:
>>> wn.words("Costume", lang="en", lexicon="own-en", pos="n")
[Word('own-en-word-costume-n')]
If no lemma parameter is parsed, the library returns all instances possible:
>>> words_en = wn.words(lexicon="own-en")
>>> len(words_en)
156584
For those already used to NLTK, see https://wn.readthedocs.io/en/latest/guides/nltk-migration.html