feat: Add Test Data classes to download from github releases #194

jjjermiah · 2025-01-22T14:32:44Z

❯ pixi run -e default imgtools --help
Usage: imgtools [OPTIONS] COMMAND [ARGS]...

  A collection of tools for working with medical imaging data.

Options:
  -q, --quiet    Suppress all logging except errors, overrides verbosity
                 options.
  -v, --verbose  Increase verbosity of logging, overrides environment
                 variable. (0-3: ERROR, WARNING, INFO, DEBUG).
  --version      Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  dicomsort    Sorts DICOM files into directories based on their tags.
  find-dicoms  A tool to find DICOM files.

versus using med-imagetools[test]

❯ pixi run -e dev imgtools --help
Usage: imgtools [OPTIONS] COMMAND [ARGS]...

  A collection of tools for working with medical imaging data.

Options:
  -q, --quiet    Suppress all logging except errors, overrides verbosity
                 options.
  -v, --verbose  Increase verbosity of logging, overrides environment
                 variable. (0-3: ERROR, WARNING, INFO, DEBUG).
  --version      Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  dicomsort    Sorts DICOM files into directories based on their tags.
  find-dicoms  A tool to find DICOM files.
  testdata     Download test data from the latest GitHub release.

Summary by CodeRabbit

New Features
- Added GitHub release management functionality for medical imaging test data.
- Introduced new classes for handling GitHub releases and assets.
- Implemented methods for downloading and extracting test datasets.
- Enhanced command-line interface with a new command for downloading test datasets based on availability.
- Expanded platform support in project configuration.
Documentation
- Updated README with improved layout and badge presentation.
- Enhanced project configuration in pyproject.toml and pixi.toml.
Tests
- Added comprehensive test suite for GitHub release data management.
- Created fixtures and test methods to validate release retrieval and extraction.
- Enhanced test coverage with additional scenarios for error handling.

…asets from GitHub releases

…nced release management

…proved initialization

…tureset module

…ling for PyGithub

…d test environments

coderabbitai · 2025-01-22T14:32:52Z

Walkthrough

This pull request introduces enhancements to the project's dataset management capabilities, particularly for handling GitHub releases related to medical imaging test data. Key changes include the addition of a new module for managing GitHub release data, updates to configuration files for improved dependency management, and the introduction of tests to ensure functionality. The README file has been updated for better presentation, and a new command-line interface function has been added to facilitate downloading test datasets.

Changes

File	Change Summary
README.md	Added centered `<div>` with project badges
pixi.toml	Added `osx-64` and `win-64` platform entries; reorganized `med-imagetools` dependencies under new feature flags
pyproject.toml	Introduced optional dependency groups `test` and `all`; commented out `debug` group
src/imgtools/datasets/github_helper.py	Added classes for managing GitHub releases and downloading medical imaging test data
src/imgtools/datasets/init.py	Imported new classes from `github_helper` and updated `__all__` list
tests/datasets/test_github_helper.py	Created test suite for `MedImageTestData` class functionality
src/imgtools/cli/main.py	Added conditional import for `testdata` command based on test data availability
src/imgtools/cli/testdatasets.py	Introduced `testdata` CLI function for downloading test datasets
.github/workflows/main.yml	Updated CI workflow to include `windows-latest` for testing and documentation jobs

Sequence Diagram

sequenceDiagram
    participant User
    participant MedImageTestData
    participant GitHubReleaseManager
    participant GitHub

    User->>MedImageTestData: Initialize
    MedImageTestData->>GitHubReleaseManager: Get Latest Release
    GitHubReleaseManager->>GitHub: Fetch Release Data
    GitHub-->>GitHubReleaseManager: Return Release Info
    GitHubReleaseManager-->>MedImageTestData: Return Release Details
    User->>MedImageTestData: Download Assets
    MedImageTestData->>GitHub: Download Asset Files
    GitHub-->>MedImageTestData: Provide Asset Files
    User->>MedImageTestData: Extract Assets
    MedImageTestData-->>User: Extracted Files

Possibly related PRs

feat: Improve structureset and crawl modules, Refactor tests and improve dataset handling #165: The changes in this PR involve enhancements to the structureset module, which may relate to the overall structure and organization of the code, similar to the changes made in the main PR to improve the README layout.
feat: refactor utility modules, add patternresolver, formt #170: This PR introduces a new PatternResolver class and updates to the CI/CD workflow, which may not directly relate to the README changes but indicates ongoing improvements in project structure.
feat: refactor io module, introduce new abstract base writer #181: The refactor of the io module and the introduction of a new abstract base writer may not directly connect to the README changes but reflect a broader effort to enhance code organization.

Poem

🐰 A Rabbit's GitHub Release Dance 🐰
With badges bright and data's grace,
Our code now fetches at light's pace.
GitHub releases, test data's might,
Downloading dreams with pure delight!
A scientific rabbit's code so neat 🔬

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

src/imgtools/datasets/github_helper.py (2)
150-152: Use the logging module instead of print statements

For better logging practices and configurability, replace print statements with the logging module. This allows for adjustable log levels and more structured output.

Apply this diff to update the logging mechanism:
+import logging

+logger = logging.getLogger(__name__)

 ...

     if filepath.exists():
-        print(f"File {asset.name} already exists. Skipping download.")
+        logger.info(f"File {asset.name} already exists. Skipping download.")
         return filepath

 ...

         for asset in latest_release.assets:
-            print(f"Downloading {asset.name}...")
+            logger.info(f"Downloading {asset.name}...")
             downloaded_path = self.download_asset(asset, dest)
             self.downloaded_paths.append(downloaded_path)

 ...

             for path in self.downloaded_paths:
-                print(f"Extracting {path.name}...")
+                logger.info(f"Extracting {path.name}...")
                 if tarfile.is_tarfile(path):

 ...

             else:
-                print(f"Unsupported archive format: {path.name}")
+                logger.warning(f"Unsupported archive format: {path.name}")
Also applies to: 177-178, 191-192, 201-201

105-105: Add exception handling for fetching the latest release

Network errors or API issues may occur when fetching the latest release. Adding exception handling will improve robustness and provide clearer error messages to users.

Consider modifying the method as follows:
def get_latest_release(self) -> GitHubRelease:
    """Fetches the latest release details from the repository."""

    try:
        release = self.repo.get_latest_release()
    except github.GithubException as e:
        print(f"Error fetching the latest release: {e}")
        raise e

    # Continue processing the release if no exception occurred
    # ...
pyproject.toml (1)
60-65: Clarify the contents of the all optional dependency group

The all group is intended to include all optional dependencies, but currently, torch and torchio are commented out. If these packages are meant to be part of the all group, consider uncommenting them once they are ready to be included.
 all = [
     "pygithub>=2.5.0",
     # add these back later
-    # "torch",
-    # "torchio",
+    "torch",
+    "torchio",
 ]
pixi.toml (1)

15-15: Consolidate med-imagetools dependency declarations

The med-imagetools package is specified multiple times: globally and under both feature.dev and feature.test. This could cause redundancy and potential conflicts. Consider consolidating these declarations or clarifying their distinct purposes.

Suggestions:

If the global declaration at line 15 is necessary for all environments, keep it and adjust the feature-specific dependencies accordingly.

If extras are only needed in specific features, ensure they are specified only where required.

Also applies to: 26-28, 54-55

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ccc0c0 and e09479d.

⛔ Files ignored due to path filters (1)

pixi.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

README.md (2 hunks)
pixi.toml (3 hunks)
pyproject.toml (1 hunks)
src/imgtools/datasets/github_helper.py (1 hunks)
src/imgtools/modules/structureset/__init__.py (1 hunks)

✅ Files skipped from review due to trivial changes (2)

README.md
src/imgtools/modules/structureset/init.py

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: Unit-Tests (macos-14, py310)
GitHub Check: Unit-Tests (macos-latest, py311)
GitHub Check: Unit-Tests (macos-latest, py310)

src/imgtools/datasets/github_helper.py

…oads and add MedImageTestData class for test data handling

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

src/imgtools/datasets/github_helper.py (2)

95-96: ⚠️ Potential issue

Correct the type annotation for repo

The type annotation for repo is incorrect. Repository should be imported directly from the github module. Update the type annotations accordingly.

Apply this diff to fix the type annotations:

- from github import Github  # type: ignore # noqa
+ from github import Github, Repository  # type: ignore # noqa

  ...

  github: Github
- repo: Github.Repository
+ repo: Repository

284-286: ⚠️ Potential issue

Fix extractall usage and implement safe extraction

The extractall method of tarfile does not accept a filter parameter. Additionally, implement a safe extraction to prevent path traversal vulnerabilities.

Apply this diff to correct the issue and enhance security:

+     import os

      ...

          if tarfile.is_tarfile(path):
              with tarfile.open(path, "r:*") as archive:
-                 archive.extractall(dest, filter="data")
+                 self.safe_extract(archive, dest)
                  extracted_paths.extend(
                      [dest / member.name for member in archive.getmembers()]
                  )
              elif zipfile.is_zipfile(path):

+     def is_within_directory(self, directory: Path, target: Path) -> bool:
+         abs_directory = directory.resolve()
+         abs_target = target.resolve()
+         return abs_directory in abs_target.parents or abs_directory == abs_target
+
+     def safe_extract(self, tar: tarfile.TarFile, path: Path) -> None:
+         for member in tar.getmembers():
+             member_path = path / member.name
+             if not self.is_within_directory(path, member_path):
+                 raise Exception("Attempted Path Traversal in Tar File")
+         tar.extractall(path=path)

🧹 Nitpick comments (2)

src/imgtools/datasets/github_helper.py (1)
11-11: Remove unused import requests

The requests module is imported but not used anywhere in the code. Removing it will clean up the imports.

Apply this diff to remove the unused import:
- import requests
🧰 Tools

🪛 Ruff (0.8.2)

11-11: requests imported but unused

Remove unused import: requests

(F401)
tests/datasets/test_github_helper.py (1)
1-1: Remove unused import Path

The Path class from pathlib is imported but not used in this test module. Removing it will tidy up the imports.

Apply this diff to remove the unused import:
- from pathlib import Path
🧰 Tools

🪛 Ruff (0.8.2)

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e09479d and fd7a21d.

⛔ Files ignored due to path filters (1)

pixi.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

pyproject.toml (1 hunks)
src/imgtools/datasets/__init__.py (1 hunks)
src/imgtools/datasets/github_helper.py (1 hunks)
tests/datasets/test_github_helper.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pyproject.toml

🧰 Additional context used

🪛 Ruff (0.8.2)

tests/datasets/test_github_helper.py

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

src/imgtools/datasets/github_helper.py

11-11: requests imported but unused

Remove unused import: requests

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Unit-Tests (macos-latest, py312)

🔇 Additional comments (1)

src/imgtools/datasets/__init__.py (1)

1-13: LGTM!

The import statements and the __all__ definition are correctly set up to expose the intended classes. Good job on maintaining a clean and explicit public API.

src/imgtools/datasets/github_helper.py

tests/datasets/test_github_helper.py

… release

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

tests/datasets/test_github_helper.py (4)

8-13: Add error handling to the fixture.

The fixture should verify that the latest release was successfully fetched to provide clearer error messages if GitHub API calls fail.

 @pytest.fixture
 def med_image_test_data():
     manager = MedImageTestData()
-    _ = manager.get_latest_release()
+    latest_release = manager.get_latest_release()
+    if latest_release is None:
+        pytest.skip("Failed to fetch latest release - skipping tests")
     return manager

15-22: Enhance test coverage for release attributes.

Consider adding assertions for additional release attributes and edge cases:

Verify that tag_name and name are not empty strings
Check for presence of assets list
Validate release URL format

 def test_get_latest_release(med_image_test_data):
     latest_release = med_image_test_data.get_latest_release()
     assert latest_release is not None
     assert latest_release.tag_name is not None
     assert latest_release.name is not None
     assert isinstance(latest_release.tag_name, str)
     assert isinstance(latest_release.name, str)
+    assert len(latest_release.tag_name) > 0, "tag_name should not be empty"
+    assert len(latest_release.name) > 0, "name should not be empty"
+    assert hasattr(latest_release, 'assets'), "release should have assets"
+    assert isinstance(latest_release.assets, list), "assets should be a list"
+    if latest_release.html_url:  # if available
+        assert latest_release.html_url.startswith('https://github.com/'), "Invalid release URL"

30-32: Document the asset selection strategy.

The selection of the first 2 assets seems arbitrary. Consider:

Documenting why only 2 assets are chosen
Using a more specific selection criteria
Testing with different numbers of assets

-    chosen_assets = release.assets[:2]
+    # Test with a subset of assets to balance coverage and execution time
+    max_test_assets = 2
+    chosen_assets = release.assets[:max_test_assets]

39-43: Enhance extracted file validation.

The test only verifies file existence and type but doesn't validate the content or structure of extracted files.

Consider adding:

File size checks
Basic content validation
Directory structure verification

     extracted_paths = med_image_test_data.extract(extract_dir)
     assert len(extracted_paths) > 0
     for path in extracted_paths:
         assert path.exists()
         assert path.is_file() or path.is_dir()
+        if path.is_file():
+            # Ensure files are not empty
+            assert path.stat().st_size > 0, f"File is empty: {path}"
+        if path.is_dir():
+            # Ensure directories contain expected structure
+            assert any(path.iterdir()), f"Directory is empty: {path}"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd7a21d and 18bb461.

📒 Files selected for processing (1)

tests/datasets/test_github_helper.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

tests/datasets/test_github_helper.py

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: Unit-Tests (ubuntu-latest, py312)
GitHub Check: Unit-Tests (ubuntu-latest, py311)
GitHub Check: Unit-Tests (ubuntu-latest, py310)

🔇 Additional comments (1)

tests/datasets/test_github_helper.py (1)

44-51: Fix file type checking for multi-extension files.

The current implementation doesn't handle multi-extension files correctly.

…stData

…lease

…of interest

…bose testing

…HUB_TOKEN and restore release body

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (1)

.github/workflows/main.yml (1)
Windows testing removal needs reconsideration

The codebase contains significant Windows-specific functionality that should be tested:

Path length and character handling in src/imgtools/dicom/sort/path_limits.py

Windows-specific dependencies in pixi.lock

File system compatibility code in src/imgtools/io/writers/old_writers.py

Removing Windows from the test matrix while maintaining Windows-specific code could lead to undetected compatibility issues. Consider either:

Keeping Windows in the test matrix to ensure continued compatibility, or

Documenting the decision to drop Windows support and gradually removing Windows-specific code

🔗 Analysis chain

Line range hint 19-19: Verify Windows support removal.

The comment indicates windows-latest was removed from the test matrix. Please verify:

Is this a temporary or permanent removal?

Are there any Windows-specific tests or configurations that should be cleaned up?

Should this be documented in the project's compatibility documentation?

Let's check for any Windows-specific configurations or tests:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for Windows-specific code or configurations
rg -i "windows|win32|win64|winapi" --type yaml --type python

# Look for OS-specific path handling
ast-grep --pattern 'os.path.join($_,$_)'
Length of output: 484

Script:
#!/bin/bash
# Search for Windows-specific code with correct file types
rg -i "windows|win32|win64|winapi"

# Check workflow file history
git log -p -- .github/workflows/main.yml | head -n 50

# Look for test configurations
rg -i "test.*config|config.*test" --type yaml
Length of output: 5527

🧹 Nitpick comments (4)

src/imgtools/cli/testdatasets.py (1)

20-41: Consider adding validation for the destination directory.

While the Click path type handles basic validation, consider adding checks for:

Available disk space

Write permissions for parent directories

Existing content in the destination directory
tests/datasets/test_github_helper.py (2)
1-1: Remove unused import.

The Path import from pathlib is not used in the code.
-from pathlib import Path
🧰 Tools

🪛 Ruff (0.8.2)

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

32-42: Consider parameterizing the test with different asset combinations.

Currently, the test uses a hardcoded list of assets. Consider using @pytest.mark.parametrize to test different combinations.
@pytest.mark.parametrize("assets", [
    ["NSCLC-Radiomics"],
    ["NSCLC_Radiogenomics", "Vestibular-Schwannoma-SEG"],
    ["invalid-asset-name"],  # Test error handling
])
def test_extract_with_different_assets(med_image_test_data, tmp_path, assets):
    # Test implementation
.github/workflows/main.yml (1)
36-36: Reconsider test output configuration in CI pipeline.

The addition of -s -vv flags to pytest might impact CI performance:

-s disables output capture, which could flood CI logs with print statements when downloading/extracting test data

-vv doubles verbosity, potentially creating very large log files

Consider these alternatives:

Remove -s flag and use pytest.mark.filterwarnings for specific tests that need output capture disabled

Use environment variables to control verbosity levels:
-          pixi run -e ${{ matrix.env }} test -s -vv
+          pixi run -e ${{ matrix.env }} test -v
+        env:
+          PYTEST_ADDOPTS: "--capture=no" # Only if absolutely necessary

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18bb461 and f0bc308.

⛔ Files ignored due to path filters (1)

pixi.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

.github/workflows/main.yml (1 hunks)
pixi.toml (4 hunks)
src/imgtools/cli/__main__.py (2 hunks)
src/imgtools/cli/testdatasets.py (1 hunks)
src/imgtools/datasets/github_helper.py (1 hunks)
tests/datasets/test_github_helper.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/imgtools/datasets/github_helper.py

🧰 Additional context used

🪛 Ruff (0.8.2)

tests/datasets/test_github_helper.py

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Unit-Tests (macos-14, py311)
GitHub Check: Unit-Tests (macos-14, py310)
GitHub Check: Unit-Tests (macos-latest, py312)
GitHub Check: Unit-Tests (macos-latest, py311)
GitHub Check: Unit-Tests (macos-latest, py310)
GitHub Check: Unit-Tests (ubuntu-latest, py312)
GitHub Check: Unit-Tests (ubuntu-latest, py311)
GitHub Check: Unit-Tests (ubuntu-latest, py310)

🔇 Additional comments (4)

src/imgtools/cli/__main__.py (1)

8-8: LGTM! Well-structured conditional CLI command registration.

The implementation gracefully handles the optional GitHub dependency by conditionally registering the testdata command only when the required module is available.

Also applies to: 33-36

src/imgtools/cli/testdatasets.py (1)

7-13: LGTM! Clean implementation of module availability check.

The is_testdata_available() function properly handles the optional GitHub dependency.

tests/datasets/test_github_helper.py (1)

56-63: Fix file type checking for multi-extension files.

Using path.suffix only captures the last extension (e.g., .gz in .nii.gz).

pixi.toml (1)

6-6: LGTM! Well-structured dependency management.

The changes:

Add proper platform support for osx-64

Configure appropriate extras for development and testing

Comment out pytest-xdist for future consideration

Also applies to: 26-27, 54-55, 60-60

src/imgtools/cli/testdatasets.py

tests/datasets/test_github_helper.py

…w for pytest

…rameters and improved token handling

…dling

codecov · 2025-01-22T17:22:01Z

Codecov Report

Attention: Patch coverage is 79.25926% with 28 lines in your changes missing coverage. Please review.

Project coverage is 50.69%. Comparing base (8ccc0c0) to head (fdbd4c5).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/imgtools/datasets/github_helper.py	78.94%	28 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #194      +/-   ##
==========================================
+ Coverage   49.55%   50.69%   +1.13%     
==========================================
  Files          45       47       +2     
  Lines        3408     3543     +135     
==========================================
+ Hits         1689     1796     +107     
- Misses       1719     1747      +28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

JoshuaSiraj · 2025-01-22T17:41:15Z

@jjjermiah noticed that medimgtools doesn't support windows??

jjjermiah · 2025-01-22T17:43:30Z

@jjjermiah noticed that medimgtools doesn't support windows??

trueee

jjjermiah · 2025-01-22T17:44:38Z

@jjjermiah noticed that medimgtools doesn't support windows??

trueee

it was just the pixi configuration, which is only for developers. that being said, we should support it so we can immediately tell if pixi cant solve an environment for it.

jjjermiah · 2025-01-22T17:45:29Z

@jjjermiah noticed that medimgtools doesn't support windows??

added to pixi and the ci/cd now:

https://github.com/bhklab/med-imagetools/actions/runs/12913974856?pr=194

lets see what happens lol

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/imgtools/datasets/github_helper.py (1)

368-370: ⚠️ Potential issue

Fix the usage of extractall and implement safe extraction.

The current implementation is vulnerable to path traversal attacks and uses an incorrect parameter.

Apply this diff to correct the issue and enhance security:

+import os

 ...

     def extract(self, dest: Path) -> List[Path]:
         """Extract downloaded archives to the specified directory."""
         if not self.downloaded_paths:
             raise ValueError(
                 "No archives have been downloaded yet. Call `download_release_data` first."
             )

         extracted_paths = []
         for path in self.downloaded_paths:
             print(f"Extracting {path.name}...")
             if tarfile.is_tarfile(path):
                 with tarfile.open(path, "r:*") as archive:
-                    archive.extractall(dest, filter="data")
+                    self.safe_extract(archive, dest)
                     extracted_paths.extend(
                         [dest / member.name for member in archive.getmembers()]
                     )
             elif zipfile.is_zipfile(path):
                 with zipfile.ZipFile(path, "r") as archive:
                     archive.extractall(dest)
                     extracted_paths.extend([dest / name for name in archive.namelist()])
             else:
                 print(f"Unsupported archive format: {path.name}")
         return extracted_paths

+    def is_within_directory(self, directory: Path, target: Path) -> bool:
+        abs_directory = directory.resolve()
+        abs_target = target.resolve()
+        return abs_directory in abs_target.parents or abs_directory == abs_target
+
+    def safe_extract(self, tar: tarfile.TarFile, path: Path) -> None:
+        for member in tar.getmembers():
+            member_path = path / member.name
+            if not self.is_within_directory(path, member_path):
+                raise Exception("Attempted Path Traversal in Tar File")
+        tar.extractall(path=path)

🧹 Nitpick comments (3)

pixi.toml (1)
60-60: Document the reason for postponing pytest-xdist.

Consider adding a brief explanation in the comment about why pytest-xdist is being postponed and any specific conditions or timeline for its reintroduction.
-# pytest-xdist = "*" re-introduce later. 
+# pytest-xdist = "*" # TODO: Re-introduce when <specific condition> is met. See issue #<number> for details.
.github/workflows/main.yml (1)
20-20: Fix trailing whitespace.

Remove the trailing whitespace at the end of the line.
-        os: [ubuntu-latest, macos-latest, macos-14, windows-latest] 
+        os: [ubuntu-latest, macos-latest, macos-14, windows-latest]
🧰 Tools

🪛 yamllint (1.35.1)

[error] 20-20: trailing spaces

(trailing-spaces)
src/imgtools/datasets/github_helper.py (1)
31-32: Enhance the error message with installation command.

The error message could be more helpful by including the pip command with the exact package name.
-        "PyGithub is required for the test data feature of med-imagetools. "
-        "Install it using 'pip install med-imagetools[test]'."
+        "PyGithub is required for the test data feature of med-imagetools. "
+        "Install it using 'pip install PyGithub' or 'pip install med-imagetools[test]'."

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0bc308 and 19d19f2.

📒 Files selected for processing (3)

.github/workflows/main.yml (2 hunks)
pixi.toml (4 hunks)
src/imgtools/datasets/github_helper.py (1 hunks)

🧰 Additional context used

🪛 yamllint (1.35.1)

.github/workflows/main.yml

[error] 20-20: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (11)

GitHub Check: Unit-Tests (windows-latest, py312)
GitHub Check: Unit-Tests (windows-latest, py311)
GitHub Check: Unit-Tests (windows-latest, py310)
GitHub Check: Unit-Tests (macos-14, py312)
GitHub Check: Unit-Tests (macos-14, py311)
GitHub Check: Unit-Tests (macos-14, py310)
GitHub Check: Unit-Tests (macos-latest, py311)
GitHub Check: Unit-Tests (macos-latest, py310)
GitHub Check: Unit-Tests (ubuntu-latest, py312)
GitHub Check: Unit-Tests (ubuntu-latest, py311)
GitHub Check: Unit-Tests (ubuntu-latest, py310)

🔇 Additional comments (6)

pixi.toml (2)

6-6: LGTM! Good enhancement of cross-platform support.

The addition of osx-64 and win-64 platforms broadens the project's accessibility to more users.

54-55: Verify the "test" extras group exists in pyproject.toml.

The addition of extras = ["test"] looks good, but we should verify that this extras group is properly defined.

✅ Verification successful

"test" extras group is properly defined

The "test" extras group exists in pyproject.toml with the required dependencies. The configuration in pixi.toml is correct.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if the "test" extras group is defined in pyproject.toml
rg -A 5 '\[project\.optional-dependencies\]|\[tool\.poetry\.extras\]' pyproject.toml
Length of output: 199
.github/workflows/main.yml (1)

35-38: LGTM! Good addition of GitHub token for test data access.

The addition of GITHUB_TOKEN environment variable is necessary for the new test data download functionality, and the modified test command with -s -vv flags improves test output visibility.
src/imgtools/datasets/github_helper.py (3)

36-92: LGTM! Well-structured and documented data classes.

The GitHubReleaseAsset and GitHubRelease classes are well-designed with comprehensive docstrings and appropriate type hints.

108-109: Correct the type annotations for github and repo attributes.

The type annotations for github and repo are incorrect. The Repository class is not an attribute of the Github class.

Apply this diff to fix the type annotations:
-from github import Github  # type: ignore # noqa
+from github import Github, Repository  # type: ignore # noqa

...

 github: Github
-repo: Github.Repository
+repo: Repository
361-361: Update the method name in the error message.

The error message references download_release_data, which does not exist.

Apply this diff to correct the error message:
      raise ValueError(
-         "No archives have been downloaded yet. Call `download_release_data` first."
+         "No archives have been downloaded yet. Call `download_all` or `download` first."
      )

coderabbitai · 2025-01-22T17:46:05Z

pixi.toml

+[feature.dev.pypi-dependencies]
+med-imagetools = { path = ".", editable = true, extras = ["all"]}
+


💡 Codebase verification

The "all" extras group is missing in pyproject.toml

The extras = ["all"] specified in pixi.toml cannot be resolved as the "all" extras group is not defined in pyproject.toml. Currently, only "torch" and "test" extras groups are available.

Add an "all" extras group in pyproject.toml that includes all optional dependencies, or

Use one of the existing extras groups: "torch" or "test"

🔗 Analysis chain

Verify the "all" extras group exists in pyproject.toml.

The addition of extras = ["all"] looks good, but we should verify that this extras group is properly defined.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check if the "all" extras group is defined in pyproject.toml rg -A 5 '\[project\.optional-dependencies\]|\[tool\.poetry\.extras\]' pyproject.toml

Length of output: 199

JoshuaSiraj · 2025-01-22T17:46:32Z

@jjjermiah noticed that medimgtools doesn't support windows??

trueee

it was just the pixi configuration, which is only for developers. that being said, we should support it so we can immediately tell if pixi cant solve an environment for it.

kk

…and consistency

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

tests/dicom/sort/test_sorter_base.py (1)
63-64: Consider removing unnecessary blank lines.

The two consecutive blank lines between class definition and method don't follow PEP 8 style guide, which recommends using a single blank line.
-

-
+
tests/dicom/sort/test_sort_method.py (2)
100-110: Consider consolidating permission restoration.

The permission restoration could be moved to a fixture to ensure cleanup even if the test fails.
+@pytest.fixture
+def temp_dir_with_permissions():
+    with TemporaryDirectory() as tmpdir:
+        path = Path(tmpdir)
+        yield path
+        os.chmod(path, 0o700)  # Always restore permissions

-def test_parent_directory_creation_error(temp_dir) -> None:
+def test_parent_directory_creation_error(temp_dir_with_permissions) -> None:
     source = temp_dir / "source.txt"
     destination = temp_dir / "no_permission_dir/destination.txt"
     source.write_text("Test content")

     os.chmod(temp_dir, 0o400)  # Read-only permission

     with pytest.raises(PermissionError):
         handle_file(source, destination, FileAction.COPY)

-    os.chmod(temp_dir, 0o700)  # Restore permissions
116-128: Consider using a more specific error message pattern.

The error message pattern could be more specific to distinguish between parent directory creation failure and write permission issues.
     with pytest.raises(
-        PermissionError, match="Failed to create parent directory or no write permission"
+        PermissionError, match="^(Failed to create parent directory|No write permission for directory)"
     ):
         handle_file(source, destination, FileAction.MOVE)
pyproject.toml (1)
62-69: Consider version alignment in 'all' group.

The commented-out dependencies should include version constraints when they're added back to maintain consistency with other dependencies in the project.
  # add these back later
- # "torch",
- # "torchio",
- # "pyvis",
+ # "torch>=2.0.0",
+ # "torchio>=0.19.0",
+ # "pyvis>=0.3.0",

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19d19f2 and fdbd4c5.

⛔ Files ignored due to path filters (1)

pixi.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

pyproject.toml (1 hunks)
tests/dicom/sort/test_sort_method.py (5 hunks)
tests/dicom/sort/test_sorter_base.py (1 hunks)

🧰 Additional context used

📓 Learnings (1)

tests/dicom/sort/test_sort_method.py (1)

Learnt from: jjjermiah
PR: bhklab/med-imagetools#137
File: tests/dicom/sort/test_sort_method.py:71-82
Timestamp: 2024-11-21T21:03:34.864Z
Learning: In the `tests/dicom/sort/test_sort_method.py` file, Unix-specific octal permissions are acceptable for permission testing; platform-independent permission testing is not required.

⏰ Context from checks skipped due to timeout of 90000ms (11)

GitHub Check: Unit-Tests (windows-latest, py312)
GitHub Check: Unit-Tests (windows-latest, py311)
GitHub Check: Unit-Tests (windows-latest, py310)
GitHub Check: Unit-Tests (macos-14, py312)
GitHub Check: Unit-Tests (macos-14, py311)
GitHub Check: Unit-Tests (macos-14, py310)
GitHub Check: Unit-Tests (macos-latest, py312)
GitHub Check: Unit-Tests (macos-latest, py310)
GitHub Check: Unit-Tests (ubuntu-latest, py312)
GitHub Check: Unit-Tests (ubuntu-latest, py311)
GitHub Check: Unit-Tests (ubuntu-latest, py310)

🔇 Additional comments (3)

tests/dicom/sort/test_sorter_base.py (1)

70-78: LGTM! Robust path comparison implementation.

The use of Path.resolve() for path comparisons is a good practice as it handles symbolic links and normalizes paths correctly.

tests/dicom/sort/test_sort_method.py (1)

81-83: LGTM! Proper platform-specific test handling.

The use of @pytest.mark.skipif for Windows is appropriate as it clearly documents the platform limitation.

Also applies to: 97-99, 113-115

pyproject.toml (1)

58-61: LGTM! Well-structured test dependencies.

The addition of pygithub and aiohttp aligns with the PR objective of downloading test data from GitHub releases.

jjjermiah added 6 commits January 21, 2025 20:26

feat: Add GitHubReleaseDataset class for fetching and downloading dat…

a92b906

…asets from GitHub releases

feat: Introduce GitHubReleaseAsset and GitHubRelease classes for enha…

b7e176f

…nced release management

feat: Enhance GitHubReleaseManager with latest release caching and im…

cef129b

…proved initialization

feat: Update MedImageTestData to filter extracted files and add struc…

f1fbdec

…tureset module

feat: Update dependencies in pyproject.toml and add import error hand…

aa26b87

…ling for PyGithub

feat: Update pixi.toml to include extras for med-imagetools in dev an…

e09479d

…d test environments

jjjermiah linked an issue Jan 22, 2025 that may be closed by this pull request

Download Data from Test Data Repo #193

Closed

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

src/imgtools/datasets/github_helper.py Show resolved Hide resolved

src/imgtools/datasets/github_helper.py Show resolved Hide resolved

jjjermiah added 2 commits January 22, 2025 15:57

feat: Enhance GitHub release management with asynchronous asset downl…

fd7a21d

…oads and add MedImageTestData class for test data handling

feat: Remove structureset module and its associated imports

2be30bc

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

src/imgtools/datasets/github_helper.py Show resolved Hide resolved

tests/datasets/test_github_helper.py Outdated Show resolved Hide resolved

tests/datasets/test_github_helper.py Show resolved Hide resolved

feat: Update test_extract to download specific assets from the latest…

18bb461

… release

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

jjjermiah added 8 commits January 22, 2025 16:05

feat: Add progress bar for asynchronous asset downloads in MedImageTe…

f768eda

…stData

feat: Implement progress bar for dataset downloads in MedImageTestData

0e79040

feat: Add CLI command to download test data from the latest GitHub re…

167900b

…lease

feat: Update test_extract to filter assets based on specific strings …

04f6e29

…of interest

feat: Add assertion to check minimum release version in test_extract

49e3e3a

feat: Temporarily comment out pytest-xdist dependency in pixi.toml

1018c71

feat: Enhance CLI with test data command and update workflows for ver…

f0bc308

…bose testing

feat: Update GitHubReleaseManager to use environment variable for GIT…

5753d1e

…HUB_TOKEN and restore release body

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

src/imgtools/cli/testdatasets.py Show resolved Hide resolved

tests/datasets/test_github_helper.py Show resolved Hide resolved

jjjermiah added 3 commits January 22, 2025 17:06

feat: Set GITHUB_TOKEN environment variable in GitHub Actions workflo…

5ad5982

…w for pytest

feat: Enhance GitHubReleaseManager to support configurable request pa…

74b77c6

…rameters and improved token handling

feat: Increase timeout for GitHub API requests and simplify token han…

a795528

…dling

jjjermiah requested a review from JoshuaSiraj January 22, 2025 17:19

JoshuaSiraj approved these changes Jan 22, 2025

View reviewed changes

feat: Add Windows support to CI workflow and update package platforms

19d19f2

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

jjjermiah added 2 commits January 22, 2025 17:58

refactor: Update test cases for file handling to improve readability …

cec5365

…and consistency

feat: Add debug optional dependency for pyvis in pyproject.toml

fdbd4c5

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

jjjermiah merged commit 3385682 into main Jan 22, 2025
20 checks passed

jjjermiah deleted the 193-download-data-from-test-data-repo branch January 22, 2025 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Test Data classes to download from github releases #194

feat: Add Test Data classes to download from github releases #194

jjjermiah commented Jan 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 22, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

codecov bot commented Jan 22, 2025 •

edited

Loading

JoshuaSiraj commented Jan 22, 2025

jjjermiah commented Jan 22, 2025

jjjermiah commented Jan 22, 2025

jjjermiah commented Jan 22, 2025

coderabbitai bot left a comment

coderabbitai bot Jan 22, 2025

JoshuaSiraj commented Jan 22, 2025

coderabbitai bot left a comment

		[feature.dev.pypi-dependencies]
		med-imagetools = { path = ".", editable = true, extras = ["all"]}

feat: Add Test Data classes to download from github releases #194

feat: Add Test Data classes to download from github releases #194

Conversation

jjjermiah commented Jan 22, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Jan 22, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram

Possibly related PRs

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 22, 2025 • edited Loading

Codecov Report

JoshuaSiraj commented Jan 22, 2025

jjjermiah commented Jan 22, 2025

jjjermiah commented Jan 22, 2025

jjjermiah commented Jan 22, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 22, 2025

Choose a reason for hiding this comment

JoshuaSiraj commented Jan 22, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

jjjermiah commented Jan 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 22, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Jan 22, 2025 •

edited

Loading