Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate yacht train #125

Merged
merged 26 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
515287f
add a temp patch script for LONG project
chunyuma Oct 15, 2024
f81eb26
update the patch script
chunyuma Oct 17, 2024
5d6712c
use matrix operations to reduce RAM
chunyuma Oct 18, 2024
b4724fd
add a patch script
chunyuma Oct 18, 2024
ee1b4b1
revert the deleted functions
chunyuma Oct 18, 2024
703a201
add Cpp scirpts and organize Python and Cpp scripts into separate fol…
chunyuma Nov 2, 2024
be17ac9
remove sourmash_plugin_branchwater from conda environment
chunyuma Nov 2, 2024
402c2b3
update build package files
chunyuma Nov 2, 2024
26377fd
remove temporary patch scripts
chunyuma Nov 2, 2024
b429ea2
fix a bug in cpp script
chunyuma Nov 4, 2024
f562c13
update a bug in package version
chunyuma Nov 4, 2024
56e10b9
update a bug in package version again
chunyuma Nov 4, 2024
c441b59
update code to include cpp
chunyuma Nov 4, 2024
677b470
update tests code to fit the changes and fix a bug in tests
chunyuma Nov 4, 2024
f9b4f9e
update README.md
chunyuma Nov 4, 2024
cf8c8a5
update conda_recipe/meta.yaml
chunyuma Nov 4, 2024
32619ad
update .github/workflows/runTest.yml for the recent changes
chunyuma Nov 4, 2024
9ca9de0
update Makefile to fix an error in CI/CD
chunyuma Nov 4, 2024
6fbc6d6
update .github/workflows/runTest.yml to debug
chunyuma Nov 4, 2024
bb28535
update CI/CD script for debug
chunyuma Nov 4, 2024
e677408
update code for debug in CI/CD
chunyuma Nov 4, 2024
8f60fb1
debug CI/CD again
chunyuma Nov 4, 2024
b58422f
debug CI/CD error
chunyuma Nov 4, 2024
3723156
update code to make conda install work
chunyuma Nov 4, 2024
b4e96a2
Add author info and Credit to Mahmudur's work
chunyuma Nov 5, 2024
4f09e53
update conda_recipe/meta.yaml
chunyuma Nov 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions .github/workflows/runTest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,35 @@ jobs:
shell: bash -el {0}
steps:
- uses: actions/checkout@v4

- name: Python Linter
uses: chartboost/ruff-action@v1
with:
src: "yacht"
src: "src"
# args: --select ALL
- uses: conda-incubator/setup-miniconda@v3
with:
miniconda-version: "latest"
activate-environment: yacht_env
environment-file: env/yacht_env.yml

- name: install YACHT locally
run: pip install -e .
run: pip install .

- name: List contents of 'yacht' in site-packages
run: |
SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
ls -R $SITE_PACKAGES/yacht

- name: make training data
run: yacht train --ref_file './tests/testdata/20_genomes_sketches.zip' --ksize 31 --prefix 'gtdb_ani_thresh_0.95' --ani_thresh 0.95 --outdir ./ --force

- name: run YACHT
run: yacht run --json ./gtdb_ani_thresh_0.95_config.json --sample_file './tests/testdata/sample.sig.zip' --significance 0.99 --min_coverage_list 1 0.6 0.2 0.1

- name: unit-tests
run: pytest tests/ --cov-report term-missing --cov-report xml:tests.xml --cov=yacht

- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v4
with:
Expand Down
38 changes: 38 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Compiler and flags
CXX = g++
CXXFLAGS = -std=c++17 -Wall -w -O3 -Wsign-compare

# Directories
SRC_DIR = src/cpp
BIN_DIR = src/yacht

# Source files
SRC_FILES = $(SRC_DIR)/main.cpp

# Object files
OBJ_FILES = $(SRC_FILES:.cpp=.o)

# Target executable
TARGET = $(BIN_DIR)/run_yacht_train_core

# Default target
all: $(TARGET)

# Create the bin directory if it doesn't exist
$(BIN_DIR):
echo "Creating directory: $(BIN_DIR)"
mkdir -p $(BIN_DIR)

# build the object files
$(OBJ_FILES): %.o: %.cpp
echo "Compiling: $<"
$(CXX) $(CXXFLAGS) -c $< -o $@ -lz

# build the target executable
$(TARGET): $(OBJ_FILES) | $(BIN_DIR)
echo "Linking to create executable: $(TARGET)"
$(CXX) $(CXXFLAGS) $(OBJ_FILES) -o $(TARGET) -lz -lpthread

# clean up
clean:
rm -f $(OBJ_FILES) $(TARGET)
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,6 @@ The most important parameter of this script is `--ani_thresh`: this is average n
| ------------------------------------- | ------------------------------------------------------------ |
| _config.json | A JSON file stores the required information needed to run the next YACHT algorithm |
| _manifest.tsv | A TSV file contains organisms and their relevant info after removing the similar ones |
| _removed_orgs_to_corr_orgas_mapping.tsv | A TSV file with two columns: removed organism names ('removed_org') and their similar genomes ('corr_orgs')|

#### Some pre-trained reference databases available on Zenodo

Expand Down
16 changes: 16 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

# Determine the platform
OS_NAME="$(uname -s)"

echo "Running build on Unix-based system: $OS_NAME"
if [[ "$OS_NAME" == "Linux" || "$OS_NAME" == "Darwin" ]]; then
# Unix-based systems (Linux or macOS)
bash build_unix.sh
elif [[ "$OS_NAME" == "MINGW"* || "$OS_NAME" == "CYGWIN"* || "$OS_NAME" == "MSYS"* ]]; then
# Windows-like environment detected, use batch file
cmd.exe /c build_windows.bat
else
echo "Unsupported platform: $OS_NAME"
exit 1
fi
4 changes: 4 additions & 0 deletions build_unix.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash

# Run the Makefile to compile the C++ code
make
21 changes: 21 additions & 0 deletions build_windows.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
@echo off

REM Set up paths for directories
set SRC_DIR=src\cpp
set BIN_DIR=src\yacht

REM Create bin directory if it doesn't exist
if not exist %BIN_DIR% (
mkdir %BIN_DIR%
)

REM Compile the main.cpp file using g++ from MinGW or another suitable compiler
g++ -std=c++17 -Wsign-compare -Wall -O3 -o %BIN_DIR%\run_yacht_train_core.exe %SRC_DIR%\main.cpp

REM Check if compilation succeeded
if %errorlevel% neq 0 (
echo Compilation failed!
exit /b %errorlevel%
)

echo Compilation successful. Executable created at %BIN_DIR%\run_yacht_train_core.exe
78 changes: 78 additions & 0 deletions conda_recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
{% set version = "1.3.0" %}

package:
name: yacht
version: {{ version }}

source:
url: https://github.com/KoslickiLab/YACHT/releases/download/v{{ version }}/yacht-{{ version }}.tar.gz
sha256: f5ea89cb4f4347bfd806b211f87ed402823ea1f603e755c05532c79ec75bb422

build:
number: 0
script: "{{ PYTHON }} -m pip install . --no-deps --no-build-isolation --no-cache-dir -vvv"

requirements:
build:
- {{ compiler('cxx') }} # Adds platform-specific C++ compiler (g++, clang, MSVC)
- make # Ensures that Make is available (for Unix)
- python >3.6,<3.12 # Python version
- pip
- setuptools

host:
- python >3.6,<3.12
- pip
- pandas
- pytaxonkit
- scipy
- sourmash
- loguru
- tqdm
- biom-format
- numpy >=1.22.4
- setuptools
- requests

run:
- python >3.6,<3.12
- sourmash >=4.8.3,<5
- scipy
- requests
- numpy >=1.22.4
- pandas
- scikit-learn
- codecov
- pytest
- pytest-cov
- loguru
- maturin >=1,<2
- tqdm
- biom-format
- pytaxonkit
- openpyxl
- ruff
- sourmash_plugin_branchwater


test:
commands:
- yacht --help

about:
home: https://github.com/KoslickiLab/YACHT
summary: 'YACHT is a mathematically rigorous hypothesis test for the presence or absence of organisms in a metagenomic sample, based on average nucleotide identity (ANI).'
license: MIT
license_family: MIT
license_file: LICENSE.txt
dev_url: https://github.com/KoslickiLab/YACHT
doc_url: https://github.com/KoslickiLab/YACHT/wiki

extra:
skip-lints:
- should_use_compilers
identifiers:
- doi:10.1093/bioinformatics/btae047
recipe-maintainers:
- chunyuma
- dkoslicki
6 changes: 3 additions & 3 deletions env/yacht_env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@ channels:
- bioconda
- defaults
dependencies:
- python>3.6
- python>3.6,<3.12
- sourmash>=4.8.3,<5
- rust
- scipy
- numpy
- numpy>=1.22.4
- pandas
- scikit-learn
- codecov
Expand All @@ -19,6 +18,7 @@ dependencies:
- tqdm
- biom-format
- pytaxonkit
- requests
- pip
- sourmash_plugin_branchwater
- pip:
Expand Down
58 changes: 54 additions & 4 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,72 @@
from setuptools import setup, find_packages
from setuptools.command.build_ext import build_ext
from setuptools.command.install import install
import os
import sys
import subprocess
import shutil

# Import the version number
from yacht import __version__
from src.yacht import __version__

# Custom build class to run the C++ compilation step
class CustomBuildExt(build_ext):
def run(self):
# Run the custom build process for C++ code
if sys.platform.startswith('win'):
# Use the Windows batch file to compile C++ code
print("Running Windows build script...")
try:
subprocess.check_call(['cmd.exe', '/c', 'build_windows.bat'])
except subprocess.CalledProcessError as e:
print(f"Error during Windows compilation: {e.output}")
raise e
else:
# Use the Unix-based shell script to compile C++ code
print("Running Unix-based build script...")
try:
subprocess.check_call(['bash', 'build_unix.sh'], stderr=subprocess.STDOUT)
except subprocess.CalledProcessError as e:
print(f"Error during Unix compilation: {e}")
raise e

# Move the compiled binary to the correct location for packaging
compiled_binary = os.path.join('src', 'yacht', 'run_yacht_train_core')
if os.path.exists(compiled_binary):
destination = os.path.join(self.build_lib, 'yacht')
os.makedirs(destination, exist_ok=True)
shutil.move(compiled_binary, destination)
else:
print("Compiled binary not found after build step.")
raise FileNotFoundError("The executable 'run_yacht_train_core' was not generated successfully.")

# Run the usual build_ext logic (necessary to continue with setuptools)
super().run()

class CustomInstall(install):
def run(self):
self.run_command('build_ext')
super().run()

setup(
name='yacht',
version=__version__,
include_package_data=True,
packages=find_packages(),
packages=find_packages(where='src'),
package_dir={'': 'src'},
cmdclass={
'build_ext': CustomBuildExt,
'install': CustomInstall
},
entry_points={
'console_scripts': [
'yacht = yacht:main',
],
},
python_requires='>=3.6',
python_requires='>3.6,<3.12',
# Add other package metadata here
author='Koslicki, D., White, S., Ma, C., & Novikov, A.',
description='YACHT is a mathematically rigorous hypothesis test for the presence or absence of organisms in a metagenomic sample, based on average nucleotide identity (ANI).',
license='MIT',
url='https://github.com/KoslickiLab/YACHT'
)
)
Loading
Loading