Improvements in documentation

kalininalab · Sep 3, 2024 · 76c633d · 76c633d
1 parent f41c122
commit 76c633d
Show file tree

Hide file tree

Showing 6 changed files with 16,261 additions and 9 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,6 +11,11 @@
 - [ ] Replace GraKel with something "modern" and fully "conda-installable" to make DataSAIL fully conda-installable
 - [ ] Include [MashMap3](https://github.com/marbl/MashMap)
 - [ ] Include MASH for amino acid sequences
+- [ ] Custom clustering methods ([Issue #25](https://github.com/kalininalab/DataSAIL/issues/25))
+
+## v1.0.1 (2024-05-08) till v1.0.7 (2024-06-27)
+
+- Bug fixes in stratification
 
 ## v1.0.0 (2024-04-04)
 

diff --git a/README.md b/README.md
@@ -44,7 +44,7 @@ pip install grakel
 to install DataSAIL in an already existing environment. Alternatively, one can install DataSAIL-lite from conda. 
 DataSAIL-lite is a version of DataSAIL that does not install all clustering algorithms as the standard DataSAIL.
 
-DataSAIL is available from Python 3.8 and newer.
+DataSAIL is available for Python 3.8 and newer.
 
 ## Usage
 
@@ -55,7 +55,7 @@ datasail --e-type P --e-data <path_to_fasta> --e-sim mmseqs --output <path_to_ou
 ````
 
 to split a set of proteins that have been clustered using mmseqs. For a full list of arguments, run `datasail -h` and 
-checkout [ReadTheDocs](https://datasail.readthedocs.io/en/latest/index.html).
+checkout [ReadTheDocs](https://datasail.readthedocs.io/en/latest/index.html). There is a more detailed explanation of the arguments and example notebooks.
 
 ## When to use DataSAIL and when not to use
 
@@ -73,7 +73,7 @@ different from your training data but not if the data in the application is more
 
 If you used DataSAIL to split your data, please cite DataSAIL in your publication.
 ````
-@article{joeres2022datasail,
+@article{joeres2023datasail,
   title={DataSAIL: Data Splitting Against Information Leakage},
   author={Joeres, Roman and Blumenthal, David B. and Kalinina, Olga V},
   journal={bioRxiv},

diff --git a/experiments/DTI/split.py b/experiments/DTI/split.py
@@ -146,9 +146,9 @@ def split_w_graphpart(base_path: Path) -> None:
 
 def main(path):
     split_w_datasail(path, TECHNIQUES["datasail"])
-    # split_w_deepchem(path, TECHNIQUES["deepchem"])
-    # split_w_lohi(path)
-    # split_w_graphpart(path)
+    split_w_deepchem(path, TECHNIQUES["deepchem"])
+    split_w_lohi(path)
+    split_w_graphpart(path)
 
 
 if __name__ == '__main__':

diff --git a/experiments/DTI/visualize.py b/experiments/DTI/visualize.py
@@ -552,6 +552,5 @@ def plot(full_path: Path):
 
 
 if __name__ == '__main__':
-    # plot(Path(sys.argv[1]))
     comp_il()
-
+    plot(Path(sys.argv[1]))
diff --git a/experiments/README.md b/experiments/README.md
@@ -2,4 +2,35 @@
 
 -------------
 
-blub
+For the publication, we have conducted several experiments:
+
+ 1. Splitting of data for drug-target interaction data,
+ 2. Splitting of data for Molecular Property Prediction,
+ 3. Splitting of data with samples belonging to either of two classes for stratified splits, 
+
+and some ablation studies based on above's data. The experiments cover all possible applications of DataSAIL. Each 
+experiments-folder is structured in the same way:
+
+ 1. `split.py`: Contains the code used for splitting using DataSAIL or baselines tools.
+ 2. `train.py`: Contains the code to train the different models on the split data.
+ 3. `visualize.py`: Contains the code to visualize the results of the training.
+
+All can be executed in the same way:
+
+```shell
+python -m experiments.<experiment>.<script> <path/to/storage-folder>
+```
+
+where `<experiment>` is the name of the experiment type (`DTI`, `MPP`, or `Strat`) and `<script>` is the name of the 
+script (`split`, `train`, or `visualize`). Lastly <path/to/storage-folder> is the path to a folder where the results 
+from the previous step can be found and new results shall be stored. Because the scripts rely on the results from the 
+previous step, it is necessary to run them in order. For example, to run the entire DTI experiment pipeline, you need 
+to run:
+
+```shell
+python -m experiments.DTI.split scratch/DataSAIL_results/DTI
+python -m experiments.DTI.train scratch/DataSAIL_results/DTI
+python -m experiments.DTI.visualize scratch/DataSAIL_results/DTI
+```
+
+where the path can be exchanged with any other path.