Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Test Data classes to download from github releases #194

Merged
merged 23 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
a92b906
feat: Add GitHubReleaseDataset class for fetching and downloading dat…
jjjermiah Jan 21, 2025
b7e176f
feat: Introduce GitHubReleaseAsset and GitHubRelease classes for enha…
jjjermiah Jan 21, 2025
cef129b
feat: Enhance GitHubReleaseManager with latest release caching and im…
jjjermiah Jan 21, 2025
f1fbdec
feat: Update MedImageTestData to filter extracted files and add struc…
jjjermiah Jan 22, 2025
aa26b87
feat: Update dependencies in pyproject.toml and add import error hand…
jjjermiah Jan 22, 2025
e09479d
feat: Update pixi.toml to include extras for med-imagetools in dev an…
jjjermiah Jan 22, 2025
fd7a21d
feat: Enhance GitHub release management with asynchronous asset downl…
jjjermiah Jan 22, 2025
2be30bc
feat: Remove structureset module and its associated imports
jjjermiah Jan 22, 2025
18bb461
feat: Update test_extract to download specific assets from the latest…
jjjermiah Jan 22, 2025
f768eda
feat: Add progress bar for asynchronous asset downloads in MedImageTe…
jjjermiah Jan 22, 2025
0e79040
feat: Implement progress bar for dataset downloads in MedImageTestData
jjjermiah Jan 22, 2025
167900b
feat: Add CLI command to download test data from the latest GitHub re…
jjjermiah Jan 22, 2025
04f6e29
feat: Update test_extract to filter assets based on specific strings …
jjjermiah Jan 22, 2025
49e3e3a
feat: Add assertion to check minimum release version in test_extract
jjjermiah Jan 22, 2025
1018c71
feat: Temporarily comment out pytest-xdist dependency in pixi.toml
jjjermiah Jan 22, 2025
f0bc308
feat: Enhance CLI with test data command and update workflows for ver…
jjjermiah Jan 22, 2025
5753d1e
feat: Update GitHubReleaseManager to use environment variable for GIT…
jjjermiah Jan 22, 2025
5ad5982
feat: Set GITHUB_TOKEN environment variable in GitHub Actions workflo…
jjjermiah Jan 22, 2025
74b77c6
feat: Enhance GitHubReleaseManager to support configurable request pa…
jjjermiah Jan 22, 2025
a795528
feat: Increase timeout for GitHub API requests and simplify token han…
jjjermiah Jan 22, 2025
19d19f2
feat: Add Windows support to CI workflow and update package platforms
jjjermiah Jan 22, 2025
cec5365
refactor: Update test cases for file handling to improve readability …
jjjermiah Jan 22, 2025
fdbd4c5
feat: Add debug optional dependency for pyvis in pyproject.toml
jjjermiah Jan 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
<div align="center">

# Med-Imagetools: Transparent and Reproducible Medical Image Processing Pipelines in Python
<!--intro-start-->

Expand All @@ -15,6 +17,8 @@
[![PyPI - Format](https://img.shields.io/pypi/format/med-imagetools)](https://pypi.org/project/med-imagetools/)
[![Downloads](https://static.pepy.tech/badge/med-imagetools)](https://pepy.tech/project/med-imagetools)

</div>

## Med-ImageTools core features

* AutoPipeline CLI
Expand Down
5 changes: 3 additions & 2 deletions pixi.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 6 additions & 4 deletions pixi.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ pip = ">=24.3.1,<25"


[pypi-dependencies]
med-imagetools = { path = ".", editable = true }
med-imagetools = { path = ".", editable = true}

[environments]
dev = ["test", "quality", "build", "dev", "docs"]
Expand All @@ -23,6 +23,9 @@ py311 = { features = ["py311", "test"], no-default-feature = true }
py312 = { features = ["py312", "test"], no-default-feature = true }

############################################## DEV ###############################################
[feature.dev.pypi-dependencies]
med-imagetools = { path = ".", editable = true, extras = ["all"]}

Comment on lines +26 to +28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

The "all" extras group is missing in pyproject.toml

The extras = ["all"] specified in pixi.toml cannot be resolved as the "all" extras group is not defined in pyproject.toml. Currently, only "torch" and "test" extras groups are available.

  • Add an "all" extras group in pyproject.toml that includes all optional dependencies, or
  • Use one of the existing extras groups: "torch" or "test"
🔗 Analysis chain

Verify the "all" extras group exists in pyproject.toml.

The addition of extras = ["all"] looks good, but we should verify that this extras group is properly defined.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the "all" extras group is defined in pyproject.toml
rg -A 5 '\[project\.optional-dependencies\]|\[tool\.poetry\.extras\]' pyproject.toml

Length of output: 199

[feature.dev.dependencies]
ipython = "*"
ipykernel = "*"
Expand All @@ -47,8 +50,9 @@ python = "3.11.*"
[feature.py312.dependencies]
python = "3.12.*"


############################################## TEST ################################################
[feature.test.pypi-dependencies]
med-imagetools = { path = ".", editable = true, extras = ["test"]}

[feature.test.dependencies]
pytest = "*"
Expand All @@ -57,8 +61,6 @@ pytest-xdist = "*"
pytest-mock = ">=3.14.0,<4"
sqlalchemy-stubs = ">=0.4,<0.5"

[feature.test.pypi-dependencies]
med-imagetools = { path = ".", editable = true }

[feature.test.tasks.test]
cmd = "pytest -c config/pytest.ini --rootdir ."
Expand Down
11 changes: 10 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,16 @@ classifiers = [
# Optional dependencies (extras)
[project.optional-dependencies]
torch = ["torch", "torchio"]
debug = ["pyvis"]
test = [
"pygithub>=2.5.0",
]
all = [
"pygithub>=2.5.0",
# add these back later
# "torch",
# "torchio",
]
# debug = ["pyvis"]

# Entry points for CLI commands
[project.scripts]
Expand Down
Empty file.
216 changes: 216 additions & 0 deletions src/imgtools/datasets/github_helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
from __future__ import annotations

import tarfile
import zipfile
from dataclasses import dataclass, field
from pathlib import Path
from typing import List

import requests
from rich import print

try:
from github import Github # type: ignore # noqa
except ImportError as e:
raise ImportError(
"PyGithub is required for the test data feature of med-imagetools. "
"Install it using 'pip install med-imagetools[test]'."
) from e


@dataclass
class GitHubReleaseAsset:
"""
Represents an asset in a GitHub release.

Attributes
----------
name : str
Name of the asset (e.g., 'dataset.zip').
url : str
Direct download URL for the asset.
content_type : str
MIME type of the asset (e.g., 'application/zip').
size : int
Size of the asset in bytes.
download_count : int
Number of times the asset has been downloaded.
"""

name: str
url: str
content_type: str
size: int
download_count: int


@dataclass
class GitHubRelease:
"""
Represents a GitHub release.

Attributes
----------
tag_name : str
The Git tag associated with the release.
name : str
The name of the release.
body : str
Release notes or description.
html_url : str
URL to view the release on GitHub.
created_at : str
ISO 8601 timestamp of release creation.
published_at : str
ISO 8601 timestamp of release publication.
assets : List[GitHubReleaseAsset]
List of assets in the release.
"""

tag_name: str
name: str
body: str
html_url: str
created_at: str
published_at: str
assets: List[GitHubReleaseAsset]


@dataclass
class GitHubReleaseManager:
"""
Class to fetch and interact with datasets from the latest GitHub release.

Attributes
----------
repo_name : str
The full name of the GitHub repository (e.g., 'user/repo').
token : str | None
Optional GitHub token for authenticated requests (higher rate limits).
"""

repo_name: str
github: Github
repo: Github.Repository
jjjermiah marked this conversation as resolved.
Show resolved Hide resolved
latest_release: GitHubRelease | None = None

def __init__(self, repo_name: str, token: str | None = None):
self.repo_name = repo_name
self.github = Github(token) if token else Github()
self.repo = self.github.get_repo(repo_name)

def get_latest_release(self) -> GitHubRelease:
"""Fetches the latest release details from the repository."""

release = self.repo.get_latest_release()

assets = [
GitHubReleaseAsset(
name=asset.name,
url=asset.browser_download_url,
content_type=asset.content_type,
size=asset.size,
download_count=asset.download_count,
)
for asset in release.get_assets()
]

self.latest_release = GitHubRelease(
tag_name=release.tag_name,
name=release.title,
body=release.body or "",
html_url=release.html_url,
created_at=release.created_at.isoformat(),
published_at=release.published_at.isoformat(),
assets=assets,
)
return self.latest_release

def download_asset(self, asset: GitHubReleaseAsset, dest: Path) -> Path:
"""
Downloads a release asset to a specified directory.

Parameters
----------
asset : GitHubReleaseAsset
The asset to download.
dest : Path
Destination directory where the file will be saved.

Returns
-------
Path
Path to the downloaded file.
"""
response = requests.get(asset.url, stream=True)
response.raise_for_status()
dest.mkdir(parents=True, exist_ok=True)
filepath = dest / asset.name

if filepath.exists():
print(f"File {asset.name} already exists. Skipping download.")
return filepath

with open(filepath, "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)

return filepath


@dataclass
class MedImageTestData(GitHubReleaseManager):
"""
Manager for downloading and extracting med-image test data from GitHub releases.
"""

downloaded_paths: List[Path] = field(default_factory=list, init=False)

def __init__(self):
super().__init__("bhklab/med-image_test-data")
self.downloaded_paths = []

def download_release_data(self, dest: Path) -> MedImageTestData:
"""Download all assets of the latest release to the specified directory."""
latest_release = self.get_latest_release()
for asset in latest_release.assets:
print(f"Downloading {asset.name}...")
downloaded_path = self.download_asset(asset, dest)
self.downloaded_paths.append(downloaded_path)
return self

def extract(self, dest: Path) -> List[Path]:
"""Extract downloaded archives to the specified directory."""
if not self.downloaded_paths:
raise ValueError(
"No archives have been downloaded yet. Call `download_release_data` first."
jjjermiah marked this conversation as resolved.
Show resolved Hide resolved
)

extracted_paths = []
for path in self.downloaded_paths:
print(f"Extracting {path.name}...")
if tarfile.is_tarfile(path):
with tarfile.open(path, "r:*") as archive:
archive.extractall(dest, filter="data")
extracted_paths.extend([dest / member.name for member in archive.getmembers()])
jjjermiah marked this conversation as resolved.
Show resolved Hide resolved
elif zipfile.is_zipfile(path):
with zipfile.ZipFile(path, "r") as archive:
archive.extractall(dest)
extracted_paths.extend([dest / name for name in archive.namelist()])
else:
print(f"Unsupported archive format: {path.name}")
return extracted_paths


# Usage example
if __name__ == "__main__":
manager = MedImageTestData()

print(manager)

manager.get_latest_release()

print(manager)

download_dir = Path("./data/med-image_test-data")
manager.download_release_data(download_dir).extract(download_dir)
21 changes: 21 additions & 0 deletions src/imgtools/modules/structureset/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from .helpers import (
extract_roi_names,
extract_rtstruct_metadata,
load_rtstruct_data,
rtstruct_reference_seriesuid,
)
from .custom_types import (
ROI,
ContourSlice,
RTSTRUCTMetadata,
)

__all__ = [
"ROI",
"ContourSlice",
"RTSTRUCTMetadata",
"extract_roi_names",
"extract_rtstruct_metadata",
"load_rtstruct_data",
"rtstruct_reference_seriesuid",
]
Loading