GDD-BERT

GDD-BERT is an AI model designed to identify drug-gene-disease interactions, aiding drug repositioning. It utilizes BERTwalk, a novel embedding technique based on BERT, to generate vector representations of nodes in biological networks by analyzing complex relationships through random walks. These embeddings are combined into drug-gene-disease triplets, which are used in a classifier to predict potential interactions.

This repository contains all the necessary files and code to reproduce the analyses conducted with GDD-BERT.

Data

Full networks are retrieved from Harmonizome (Accessed 04 April 2024). Gene-chemical and gene-disease networks are available at Harmonizome CTD, gene-gene networks are available at Harmonizome Pathway Commons.

The data directory includes both the biological networks and their corresponding embeddings for two-edge and ten-edge configurations. Additionally, the utils folder contains the following key files:

drug_list.txt: A list of unique drugs used retrieved from CTD Gene-Drug network.
disease_list.txt: A list of unique diseases retrieved from CTD Gene-Disease network.
only_train_genes.txt: A list of high-similarity genes restricted to the training set.

Notebooks

The repository provides Jupyter notebooks to process data and train classifiers:

create_dfs.ipynb: Prepares training and validation datasets by combining input networks and their corresponding embeddings.
MLP_classifier.ipynb: Trains a Multi-Layer Perceptron (MLP) classifier on the training set and evaluates its performance on the validation set.
ML_classifiers.ipynb: Trains and evaluates various machine learning classifiers, including Logistic Regression (LR), Support Vector Machines (SVM), and Random Forests (RF).

Usage

Prepare the datasets:
Run create_dfs.ipynb to process the input networks and embeddings, generating the training and validation datasets.
Train classifiers:
- Use MLP_classifier.ipynb to train and evaluate an MLP model.
- Alternatively, run ML_classifiers.ipynb to train and evaluate other classifiers (LR, SVM, RF).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data		Data
Notebooks		Notebooks
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GDD-BERT

Data

Notebooks

Usage

About

Releases

Packages

Languages

License

InfOmics/GDD-BERT

Folders and files

Latest commit

History

Repository files navigation

GDD-BERT

Data

Notebooks

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages