Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
rsgoncalves committed Jun 5, 2024
1 parent 38503b3 commit 0300f2b
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,9 +156,9 @@ The function returns a pandas `DataFrame` containing the generated ontology mapp
- If a term is tagged with "Ignore", text2term will not map it
- Unmapped terms can still be included in the output if `incl_unmapped` is True

`target_ontology`—Path, URL or name of 'target' ontology to map the source terms to. Ontology names can be given as values to `target_ontology` e.g. "EFO" or "CL"--text2term uses [bioregistry](https://bioregistry.io) to get URLs for such names. Similarly, when the target ontology has been cached, enter the name used upon caching.
`target_ontology`—Path, URL or name of 'target' ontology to map the source terms to. Ontology names can be given as values to `target_ontology` e.g. "EFO" or "CL"—text2term uses [bioregistry](https://bioregistry.io) to get URLs for such names. Similarly, when the target ontology has been cached, enter the name used upon caching.

When using BioPortal or Zooma, this should be a comma-separated list of ontology acronyms (eg 'EFO,HPO') or **'all'** to search all ontologies.
When using the BioPortal or Zooma interfaces, the value for `target_ontology` should be a comma-separated list of ontology acronyms (eg 'EFO,HPO') or **'all'** to search all ontologies.

`base_iris`—Map only to ontology terms whose IRIs start with one of the strings given in this tuple

Expand Down Expand Up @@ -303,21 +303,21 @@ To display a help message with descriptions of tool arguments do:

## Supported Mappers

The mapping score of each mapping is indicative of how similar an input term is to an ontology term (via its labels or synonyms). The mapping/similarity scores generated by text2term are the result of applying one of the following "mappers":
The mapping score of each mapping indicates how similar an input term is to an ontology term (via its labels or synonyms). The mapping scores generated by text2term are the result of applying one of the following _mappers_:

**TF-IDF-based mapper**—[TF-IDF](https://en.wikipedia.org/wiki/Tf–idf) is a statistical measure often used in information retrieval that measures how important a word is to a document in a corpus of documents. We first generate TF-IDF-based vectors of the source terms and of labels and synonyms of ontology terms. Then we compute the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between vectors to determine how similar a source term is to a target term (label or synonym).

**BioPortal Web API-based mapper**—uses an interface to the [BioPortal Annotator](https://bioportal.bioontology.org/annotator) that we built to allow mapping terms to ontologies in the [BioPortal](https://bioportal.bioontology.org) repository.
**Syntactic distance-based mappers**—text2term provides support for commonly used and popular syntactic (edit) distance metrics: Levenshtein, Jaro, Jaro-Winkler, Jaccard, and Indel. We use the [nltk](https://pypi.org/project/nltk/) package to compute Jaccard distances and [rapidfuzz](https://pypi.org/project/rapidfuzz/) to compute all others.

**BioPortal Web API-based mapper**—uses an interface to the [BioPortal Annotator](https://bioportal.bioontology.org/annotator) that we built to allow mapping terms in bulk to ontologies in the [BioPortal](https://bioportal.bioontology.org) repository.

> [!WARNING]
> There are no scores associated with BioPortal annotations, so the score of all mappings is always 1
**Zooma Web API-based mapper**—uses a [Zooma](https://www.ebi.ac.uk/spot/zooma/) interface that we built to allow mapping terms to ontologies in the [Ontology Lookup Service (OLS)](https://www.ebi.ac.uk/ols4) repository.
**Zooma Web API-based mapper**—uses a [Zooma](https://www.ebi.ac.uk/spot/zooma/) interface that we built to allow mapping terms in bulk to ontologies in the [Ontology Lookup Service (OLS)](https://www.ebi.ac.uk/ols4) repository.

> [!IMPORTANT]
> When using the BioPortal or Zooma interfaces, make sure to specify the target ontology name(s) as they appear in BioPortal or OLS, respectively
**Syntactic distance-based mappers**—text2term provides support for commonly used and popular syntactic (edit) distance metrics: Levenshtein, Jaro, Jaro-Winkler, Jaccard, and Indel. We use the [nltk](https://pypi.org/project/nltk/) package to compute Jaccard distances and [rapidfuzz](https://pypi.org/project/rapidfuzz/) to compute all others.
> When using the BioPortal or Zooma interfaces, make sure to specify the target ontology name(s) as they appear in BioPortal or OLS, respectively
> [!NOTE]
> Syntactic distance-based mappers and Web API-based mappers perform slowly (much slower than the TF-IDF mapper). The former because they do pairwise comparisons between each input string and each ontology term label/synonym. In the Web API-based approaches there are networking and API load overheads.
> Syntactic distance-based mappers and Web API-based mappers perform slowly (much slower than the TF-IDF mapper). The former because they do pairwise comparisons between each input string and each ontology term label/synonym. In the Web API-based approaches there are networking and API load overheads

0 comments on commit 0300f2b

Please sign in to comment.