Docker Container for OntoClue

This repository contains the code for the Docker container for OntoClue.

About OntoClue

OntoClue is a project that explores various embedding approaches to assess document-to-document similarity using the RELISH Corpus. RELISH is an expert-curated database designed for benchmarking document similarity in biomedical literature and consists of PubMed IDs (PMIDs) with their expert curated corresponding document-2-document relevance assessments wrt other PMIDs. Relevance is categorized as "relevant", "partial" or "irrelevant".

This project consists of several approaches, each with detailed explanations and documentation. These approaches can be executed individually by following the instructions provided in their respective repositories.

Dockerized version of OntoClue

The Docker container for this project ensures reproducibility of the runs, allowing for consistent training and evaluation of different neural network models on the document-to-document similarity within the RELISH Corpus. The entire pipeline can be executed with a few commands. This pipeline includes:

Cloning the corresponding repository for the selected approach, which contains all the necessary code.
Downloading the appropriate pre-processed datasets based on the chosen approach (normal text vs annotated text).
Running tests to verify dataset integrity and reproducibility of runs.
Evaluating the models.

NOTE: The datasets that are downloaded are already preprocessed using a preprocessing pipeline, as explained in the relish-preprocessing repository, and include annotated datasets as well. Documentation on datasets, data preprocessing, and annotation are also available.

Requirements

In order to get started with the pipeline it is essential that you have Docker installed. Please follow the instructions below to install Docker.

Setting up Docker on Linux

For Linux distribution like Ubuntu, Debian, CentOS, execute the following:

Update your existing list of packages:

sudo apt update

Install a few prerequisite packages which let apt use packages over HTTPS:

sudo apt install apt-transport-https ca-certificates curl software-properties-common

Add the GPG key for the official Docker repository:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

Add the Docker repository to APT sources:

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"

Update the package database with the Docker packages:

sudo apt update

Install Docker:

sudo apt install docker-ce

Getting Started

1. Clone the Repository

First, clone the repository to your local machine using the following command:

Using HTTP:

git clone https://github.com/zbmed-semtec/private-ontoclue-project.git

Using SSH:

Ensure you have set up SSH keys in your GitHub account.

git clone git@github.com:zbmed-semtec/private-ontoclue-project.git

You will also need to setup your ssh agent for it to work with the docker container.

eval $(ssh-agent) > /dev/null
ssh-add  -k /path/to/your/key

example with default name for the keys.

eval $(ssh-agent) > /dev/null
ssh-add  -k /home/user/.ssh/id_ed25519

2. Building the Docker Image:

Change the directory to private-ontoclue-project and execute the following command:

sudo docker build -t ontoclue .

3. Running the Docker Container:

sudo docker run -it ontoclue

If you are using the ssh agent, you will need to pass the ssh agent socket to the container.

docker run --mount type=bind,source=$SSH_AUTH_SOCK,target=/ssh-agent --env SSH_AUTH_SOCK=/ssh-agent -it ontoclue

4. Selecting Embedding Approach:

After running the container, you will be prompted to select an embedding approach:

Upon selecting an approach, the corresponding repository will be cloned from GitHub, and the appropriate datasets will be downloaded based on the chosen approach.

5. Running Tests [Optional]:

Once the datasets are downloaded, you will have the option to run tests. This is an optional step. These tests verify:

If the data was downloaded to the correct directory.
If the correct data corresponding to the selected approach was downloaded.
Quick reproducibility checks between runs.

Depending on your preference, you can select y (yes) or n (no). You will see a prompt like this:

6. Selecting Class Distribution:

After the tests are completed, you will be prompted to select the class distribution. Depending on your preference, you can select 3 (three class distribution) or 2 (two class distribution).

Following this, you will see a message indicating that the pipeline is being initiated. This process will take a while to complete 100 iterations.

7. Acessing and Viewing Log Files:

The progress of the run is logged into files named Optuna_trials_{class_distribution}.log. Follow the steps below to view these log files and copy output files from the Docker container to your local system.

1. List all containers:

First, list all running and stopped containers to find the one you need to access:

sudo docker ps -a

2. Enter the container:

Access the running Docker container using its container ID:

sudo docker exec -it <container_id> /bin/bash

3. View log files:

Once inside the container, view the log file associated with your specific run:

cat <name_of_the_approach>/output_{3/2}/Optuna_trials_{3/2}.log

Here, replace {3/2} with 3 for a three-class distribution or 2 for a two-class distribution.

4. Output Directory

The output files are stored in:

<name_of_the_approach>/output_3 for a three-class distribution.
<name_of_the_approach>/output_2 for a two-class distribution.

5. Copying files to your local system

To copy files from the Docker container to your local system, use the following command:

sudo docker cp <container_ID>:/<name_of_the_approach>/output_[3/2] <path_to_local_dir>

Replace the first path with the appropriate path inside the container and the second path with the destination directory on your local machine.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
docs		docs
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
data.sh		data.sh
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker Container for OntoClue

About OntoClue

Dockerized version of OntoClue

Requirements

Setting up Docker on Linux

Getting Started

1. Clone the Repository

Using HTTP:

Using SSH:

2. Building the Docker Image:

3. Running the Docker Container:

4. Selecting Embedding Approach:

5. Running Tests [Optional]:

6. Selecting Class Distribution:

7. Acessing and Viewing Log Files:

1. List all containers:

2. Enter the container:

3. View log files:

4. Output Directory

5. Copying files to your local system

About

Releases

Packages

Contributors 4

Languages

License

zbmed-semtec/ontoclue_docker

Folders and files

Latest commit

History

Repository files navigation

Docker Container for OntoClue

About OntoClue

Dockerized version of OntoClue

Requirements

Setting up Docker on Linux

Getting Started

1. Clone the Repository

Using HTTP:

Using SSH:

2. Building the Docker Image:

3. Running the Docker Container:

4. Selecting Embedding Approach:

5. Running Tests [Optional]:

6. Selecting Class Distribution:

7. Acessing and Viewing Log Files:

1. List all containers:

2. Enter the container:

3. View log files:

4. Output Directory

5. Copying files to your local system

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages