Skip to content
Fredson Aguiar edited this page Oct 7, 2021 · 2 revisions

OpenWordnet and WN package

OpenWordnets for Portuguese and English are published, besides RDF format, in LMF, which is a format compatible with WN package. WN is a python package developed as an interface for wordnet data, containing useful features such as queries, path finding and similarity.

Getting OWN

OWNs for Portuguese and English may be downloaded in LMF format in the releases page https://github.com/own-pt/openWordnet-PT/releases. One is allowed to download only own-pt (for Portuguese), own-en (for English) or own (both data) data packaged. Follow, for instance:

>>> import wn
>>> wn.add("own-en.tar.gz")
Added own-en:1.0.0 (OpenWordnet-EN)

Besides, it is possible to download an index file, in index.toml and use it for downloading a lexicon from command line:

$ python -m wn download --index index.toml "own-en"
Download [##############################] (11766130/11766130 bytes) Complete
Added own-en:1.0.0 (OpenWordnet-EN)

In the future, the data may be indexed as part of the standard library.

Use Examples

For the following examples, consider we have downloaded both data, and already imported WN:

>>> import wn

Simple Queries

WN supports methods for retrieving Words, Wordsenses and Synsets:

>>> wn.words("dog")
[Word('own-en-word-dog-n'), Word('own-en-word-dog-v')]
>>> wn.senses("dog")
[Sense('own-en-wordsense-02084071-n-2'), Sense('own-en-wordsense-02710044-n-2'), Sense('own-en-wordsense-03901548-n-3'), ...]
>>> wn.synsets("dog")
[Synset('own-en-synset-02084071-n'), Synset('own-en-synset-02710044-n'), Synset('own-en-synset-03901548-n'), ...]

As standard, when working with more than one lexicon, consider the queries run over the complete data:

>>> wn.words("Costume") # returns for english and portuguese
[Word('own-en-word-costume-n'), Word('own-en-word-costume-v'), Word('own-pt-word-costume-n')]

In that cases, one may specify lexicon, language or part-of-speech:

>>> wn.words("Costume", lang="en", lexicon="own-en", pos="n")
[Word('own-en-word-costume-n')]

If no lemma parameter is parsed, the library returns all instances possible:

>>> words_en = wn.words(lexicon="own-en")
>>> len(words_en)
156584

Words Similarity

Wordsense Disambiguation

Lemmatization and Normalization

WN from NLTK

For those already used to NLTK, see https://wn.readthedocs.io/en/latest/guides/nltk-migration.html