-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Test Data classes to download from github releases #194
Merged
Merged
Changes from 6 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
a92b906
feat: Add GitHubReleaseDataset class for fetching and downloading dat…
jjjermiah b7e176f
feat: Introduce GitHubReleaseAsset and GitHubRelease classes for enha…
jjjermiah cef129b
feat: Enhance GitHubReleaseManager with latest release caching and im…
jjjermiah f1fbdec
feat: Update MedImageTestData to filter extracted files and add struc…
jjjermiah aa26b87
feat: Update dependencies in pyproject.toml and add import error hand…
jjjermiah e09479d
feat: Update pixi.toml to include extras for med-imagetools in dev an…
jjjermiah fd7a21d
feat: Enhance GitHub release management with asynchronous asset downl…
jjjermiah 2be30bc
feat: Remove structureset module and its associated imports
jjjermiah 18bb461
feat: Update test_extract to download specific assets from the latest…
jjjermiah f768eda
feat: Add progress bar for asynchronous asset downloads in MedImageTe…
jjjermiah 0e79040
feat: Implement progress bar for dataset downloads in MedImageTestData
jjjermiah 167900b
feat: Add CLI command to download test data from the latest GitHub re…
jjjermiah 04f6e29
feat: Update test_extract to filter assets based on specific strings …
jjjermiah 49e3e3a
feat: Add assertion to check minimum release version in test_extract
jjjermiah 1018c71
feat: Temporarily comment out pytest-xdist dependency in pixi.toml
jjjermiah f0bc308
feat: Enhance CLI with test data command and update workflows for ver…
jjjermiah 5753d1e
feat: Update GitHubReleaseManager to use environment variable for GIT…
jjjermiah 5ad5982
feat: Set GITHUB_TOKEN environment variable in GitHub Actions workflo…
jjjermiah 74b77c6
feat: Enhance GitHubReleaseManager to support configurable request pa…
jjjermiah a795528
feat: Increase timeout for GitHub API requests and simplify token han…
jjjermiah 19d19f2
feat: Add Windows support to CI workflow and update package platforms
jjjermiah cec5365
refactor: Update test cases for file handling to improve readability …
jjjermiah fdbd4c5
feat: Add debug optional dependency for pyvis in pyproject.toml
jjjermiah File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,216 @@ | ||
from __future__ import annotations | ||
|
||
import tarfile | ||
import zipfile | ||
from dataclasses import dataclass, field | ||
from pathlib import Path | ||
from typing import List | ||
|
||
import requests | ||
from rich import print | ||
|
||
try: | ||
from github import Github # type: ignore # noqa | ||
except ImportError as e: | ||
raise ImportError( | ||
"PyGithub is required for the test data feature of med-imagetools. " | ||
"Install it using 'pip install med-imagetools[test]'." | ||
) from e | ||
|
||
|
||
@dataclass | ||
class GitHubReleaseAsset: | ||
""" | ||
Represents an asset in a GitHub release. | ||
|
||
Attributes | ||
---------- | ||
name : str | ||
Name of the asset (e.g., 'dataset.zip'). | ||
url : str | ||
Direct download URL for the asset. | ||
content_type : str | ||
MIME type of the asset (e.g., 'application/zip'). | ||
size : int | ||
Size of the asset in bytes. | ||
download_count : int | ||
Number of times the asset has been downloaded. | ||
""" | ||
|
||
name: str | ||
url: str | ||
content_type: str | ||
size: int | ||
download_count: int | ||
|
||
|
||
@dataclass | ||
class GitHubRelease: | ||
""" | ||
Represents a GitHub release. | ||
|
||
Attributes | ||
---------- | ||
tag_name : str | ||
The Git tag associated with the release. | ||
name : str | ||
The name of the release. | ||
body : str | ||
Release notes or description. | ||
html_url : str | ||
URL to view the release on GitHub. | ||
created_at : str | ||
ISO 8601 timestamp of release creation. | ||
published_at : str | ||
ISO 8601 timestamp of release publication. | ||
assets : List[GitHubReleaseAsset] | ||
List of assets in the release. | ||
""" | ||
|
||
tag_name: str | ||
name: str | ||
body: str | ||
html_url: str | ||
created_at: str | ||
published_at: str | ||
assets: List[GitHubReleaseAsset] | ||
|
||
|
||
@dataclass | ||
class GitHubReleaseManager: | ||
""" | ||
Class to fetch and interact with datasets from the latest GitHub release. | ||
|
||
Attributes | ||
---------- | ||
repo_name : str | ||
The full name of the GitHub repository (e.g., 'user/repo'). | ||
token : str | None | ||
Optional GitHub token for authenticated requests (higher rate limits). | ||
""" | ||
|
||
repo_name: str | ||
github: Github | ||
repo: Github.Repository | ||
jjjermiah marked this conversation as resolved.
Show resolved
Hide resolved
|
||
latest_release: GitHubRelease | None = None | ||
|
||
def __init__(self, repo_name: str, token: str | None = None): | ||
self.repo_name = repo_name | ||
self.github = Github(token) if token else Github() | ||
self.repo = self.github.get_repo(repo_name) | ||
|
||
def get_latest_release(self) -> GitHubRelease: | ||
"""Fetches the latest release details from the repository.""" | ||
|
||
release = self.repo.get_latest_release() | ||
|
||
assets = [ | ||
GitHubReleaseAsset( | ||
name=asset.name, | ||
url=asset.browser_download_url, | ||
content_type=asset.content_type, | ||
size=asset.size, | ||
download_count=asset.download_count, | ||
) | ||
for asset in release.get_assets() | ||
] | ||
|
||
self.latest_release = GitHubRelease( | ||
tag_name=release.tag_name, | ||
name=release.title, | ||
body=release.body or "", | ||
html_url=release.html_url, | ||
created_at=release.created_at.isoformat(), | ||
published_at=release.published_at.isoformat(), | ||
assets=assets, | ||
) | ||
return self.latest_release | ||
|
||
def download_asset(self, asset: GitHubReleaseAsset, dest: Path) -> Path: | ||
""" | ||
Downloads a release asset to a specified directory. | ||
|
||
Parameters | ||
---------- | ||
asset : GitHubReleaseAsset | ||
The asset to download. | ||
dest : Path | ||
Destination directory where the file will be saved. | ||
|
||
Returns | ||
------- | ||
Path | ||
Path to the downloaded file. | ||
""" | ||
response = requests.get(asset.url, stream=True) | ||
response.raise_for_status() | ||
dest.mkdir(parents=True, exist_ok=True) | ||
filepath = dest / asset.name | ||
|
||
if filepath.exists(): | ||
print(f"File {asset.name} already exists. Skipping download.") | ||
return filepath | ||
|
||
with open(filepath, "wb") as file: | ||
for chunk in response.iter_content(chunk_size=8192): | ||
file.write(chunk) | ||
|
||
return filepath | ||
|
||
|
||
@dataclass | ||
class MedImageTestData(GitHubReleaseManager): | ||
""" | ||
Manager for downloading and extracting med-image test data from GitHub releases. | ||
""" | ||
|
||
downloaded_paths: List[Path] = field(default_factory=list, init=False) | ||
|
||
def __init__(self): | ||
super().__init__("bhklab/med-image_test-data") | ||
self.downloaded_paths = [] | ||
|
||
def download_release_data(self, dest: Path) -> MedImageTestData: | ||
"""Download all assets of the latest release to the specified directory.""" | ||
latest_release = self.get_latest_release() | ||
for asset in latest_release.assets: | ||
print(f"Downloading {asset.name}...") | ||
downloaded_path = self.download_asset(asset, dest) | ||
self.downloaded_paths.append(downloaded_path) | ||
return self | ||
|
||
def extract(self, dest: Path) -> List[Path]: | ||
"""Extract downloaded archives to the specified directory.""" | ||
if not self.downloaded_paths: | ||
raise ValueError( | ||
"No archives have been downloaded yet. Call `download_release_data` first." | ||
jjjermiah marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
|
||
extracted_paths = [] | ||
for path in self.downloaded_paths: | ||
print(f"Extracting {path.name}...") | ||
if tarfile.is_tarfile(path): | ||
with tarfile.open(path, "r:*") as archive: | ||
archive.extractall(dest, filter="data") | ||
extracted_paths.extend([dest / member.name for member in archive.getmembers()]) | ||
jjjermiah marked this conversation as resolved.
Show resolved
Hide resolved
|
||
elif zipfile.is_zipfile(path): | ||
with zipfile.ZipFile(path, "r") as archive: | ||
archive.extractall(dest) | ||
extracted_paths.extend([dest / name for name in archive.namelist()]) | ||
else: | ||
print(f"Unsupported archive format: {path.name}") | ||
return extracted_paths | ||
|
||
|
||
# Usage example | ||
if __name__ == "__main__": | ||
manager = MedImageTestData() | ||
|
||
print(manager) | ||
|
||
manager.get_latest_release() | ||
|
||
print(manager) | ||
|
||
download_dir = Path("./data/med-image_test-data") | ||
manager.download_release_data(download_dir).extract(download_dir) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
from .helpers import ( | ||
extract_roi_names, | ||
extract_rtstruct_metadata, | ||
load_rtstruct_data, | ||
rtstruct_reference_seriesuid, | ||
) | ||
from .custom_types import ( | ||
ROI, | ||
ContourSlice, | ||
RTSTRUCTMetadata, | ||
) | ||
|
||
__all__ = [ | ||
"ROI", | ||
"ContourSlice", | ||
"RTSTRUCTMetadata", | ||
"extract_roi_names", | ||
"extract_rtstruct_metadata", | ||
"load_rtstruct_data", | ||
"rtstruct_reference_seriesuid", | ||
] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
The "all" extras group is missing in pyproject.toml
The
extras = ["all"]
specified in pixi.toml cannot be resolved as the "all" extras group is not defined in pyproject.toml. Currently, only "torch" and "test" extras groups are available.🔗 Analysis chain
Verify the "all" extras group exists in pyproject.toml.
The addition of
extras = ["all"]
looks good, but we should verify that this extras group is properly defined.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
Length of output: 199