Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Test Data classes to download from github releases #194

Merged
merged 23 commits into from
Jan 22, 2025

Conversation

jjjermiah
Copy link
Contributor

@jjjermiah jjjermiah commented Jan 22, 2025

❯ pixi run -e default imgtools --help
Usage: imgtools [OPTIONS] COMMAND [ARGS]...

  A collection of tools for working with medical imaging data.

Options:
  -q, --quiet    Suppress all logging except errors, overrides verbosity
                 options.
  -v, --verbose  Increase verbosity of logging, overrides environment
                 variable. (0-3: ERROR, WARNING, INFO, DEBUG).
  --version      Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  dicomsort    Sorts DICOM files into directories based on their tags.
  find-dicoms  A tool to find DICOM files.

versus using med-imagetools[test]

❯ pixi run -e dev imgtools --help
Usage: imgtools [OPTIONS] COMMAND [ARGS]...

  A collection of tools for working with medical imaging data.

Options:
  -q, --quiet    Suppress all logging except errors, overrides verbosity
                 options.
  -v, --verbose  Increase verbosity of logging, overrides environment
                 variable. (0-3: ERROR, WARNING, INFO, DEBUG).
  --version      Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  dicomsort    Sorts DICOM files into directories based on their tags.
  find-dicoms  A tool to find DICOM files.
  testdata     Download test data from the latest GitHub release.

Summary by CodeRabbit

  • New Features

    • Added GitHub release management functionality for medical imaging test data.
    • Introduced new classes for handling GitHub releases and assets.
    • Implemented methods for downloading and extracting test datasets.
    • Enhanced command-line interface with a new command for downloading test datasets based on availability.
    • Expanded platform support in project configuration.
  • Documentation

    • Updated README with improved layout and badge presentation.
    • Enhanced project configuration in pyproject.toml and pixi.toml.
  • Tests

    • Added comprehensive test suite for GitHub release data management.
    • Created fixtures and test methods to validate release retrieval and extraction.
    • Enhanced test coverage with additional scenarios for error handling.

@jjjermiah jjjermiah linked an issue Jan 22, 2025 that may be closed by this pull request
Copy link
Contributor

coderabbitai bot commented Jan 22, 2025

Walkthrough

This pull request introduces enhancements to the project's dataset management capabilities, particularly for handling GitHub releases related to medical imaging test data. Key changes include the addition of a new module for managing GitHub release data, updates to configuration files for improved dependency management, and the introduction of tests to ensure functionality. The README file has been updated for better presentation, and a new command-line interface function has been added to facilitate downloading test datasets.

Changes

File Change Summary
README.md Added centered <div> with project badges
pixi.toml Added osx-64 and win-64 platform entries; reorganized med-imagetools dependencies under new feature flags
pyproject.toml Introduced optional dependency groups test and all; commented out debug group
src/imgtools/datasets/github_helper.py Added classes for managing GitHub releases and downloading medical imaging test data
src/imgtools/datasets/init.py Imported new classes from github_helper and updated __all__ list
tests/datasets/test_github_helper.py Created test suite for MedImageTestData class functionality
src/imgtools/cli/main.py Added conditional import for testdata command based on test data availability
src/imgtools/cli/testdatasets.py Introduced testdata CLI function for downloading test datasets
.github/workflows/main.yml Updated CI workflow to include windows-latest for testing and documentation jobs

Sequence Diagram

sequenceDiagram
    participant User
    participant MedImageTestData
    participant GitHubReleaseManager
    participant GitHub

    User->>MedImageTestData: Initialize
    MedImageTestData->>GitHubReleaseManager: Get Latest Release
    GitHubReleaseManager->>GitHub: Fetch Release Data
    GitHub-->>GitHubReleaseManager: Return Release Info
    GitHubReleaseManager-->>MedImageTestData: Return Release Details
    User->>MedImageTestData: Download Assets
    MedImageTestData->>GitHub: Download Asset Files
    GitHub-->>MedImageTestData: Provide Asset Files
    User->>MedImageTestData: Extract Assets
    MedImageTestData-->>User: Extracted Files
Loading

Possibly related PRs

Poem

🐰 A Rabbit's GitHub Release Dance 🐰
With badges bright and data's grace,
Our code now fetches at light's pace.
GitHub releases, test data's might,
Downloading dreams with pure delight!
A scientific rabbit's code so neat 🔬

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
src/imgtools/datasets/github_helper.py (2)

150-152: Use the logging module instead of print statements

For better logging practices and configurability, replace print statements with the logging module. This allows for adjustable log levels and more structured output.

Apply this diff to update the logging mechanism:

+import logging

+logger = logging.getLogger(__name__)

 ...

     if filepath.exists():
-        print(f"File {asset.name} already exists. Skipping download.")
+        logger.info(f"File {asset.name} already exists. Skipping download.")
         return filepath

 ...

         for asset in latest_release.assets:
-            print(f"Downloading {asset.name}...")
+            logger.info(f"Downloading {asset.name}...")
             downloaded_path = self.download_asset(asset, dest)
             self.downloaded_paths.append(downloaded_path)

 ...

             for path in self.downloaded_paths:
-                print(f"Extracting {path.name}...")
+                logger.info(f"Extracting {path.name}...")
                 if tarfile.is_tarfile(path):

 ...

             else:
-                print(f"Unsupported archive format: {path.name}")
+                logger.warning(f"Unsupported archive format: {path.name}")

Also applies to: 177-178, 191-192, 201-201


105-105: Add exception handling for fetching the latest release

Network errors or API issues may occur when fetching the latest release. Adding exception handling will improve robustness and provide clearer error messages to users.

Consider modifying the method as follows:

def get_latest_release(self) -> GitHubRelease:
    """Fetches the latest release details from the repository."""

    try:
        release = self.repo.get_latest_release()
    except github.GithubException as e:
        print(f"Error fetching the latest release: {e}")
        raise e

    # Continue processing the release if no exception occurred
    # ...
pyproject.toml (1)

60-65: Clarify the contents of the all optional dependency group

The all group is intended to include all optional dependencies, but currently, torch and torchio are commented out. If these packages are meant to be part of the all group, consider uncommenting them once they are ready to be included.

 all = [
     "pygithub>=2.5.0",
     # add these back later
-    # "torch",
-    # "torchio",
+    "torch",
+    "torchio",
 ]
pixi.toml (1)

15-15: Consolidate med-imagetools dependency declarations

The med-imagetools package is specified multiple times: globally and under both feature.dev and feature.test. This could cause redundancy and potential conflicts. Consider consolidating these declarations or clarifying their distinct purposes.

Suggestions:

  • If the global declaration at line 15 is necessary for all environments, keep it and adjust the feature-specific dependencies accordingly.
  • If extras are only needed in specific features, ensure they are specified only where required.

Also applies to: 26-28, 54-55

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ccc0c0 and e09479d.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • README.md (2 hunks)
  • pixi.toml (3 hunks)
  • pyproject.toml (1 hunks)
  • src/imgtools/datasets/github_helper.py (1 hunks)
  • src/imgtools/modules/structureset/__init__.py (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • README.md
  • src/imgtools/modules/structureset/init.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Unit-Tests (macos-14, py310)
  • GitHub Check: Unit-Tests (macos-latest, py311)
  • GitHub Check: Unit-Tests (macos-latest, py310)

src/imgtools/datasets/github_helper.py Show resolved Hide resolved
src/imgtools/datasets/github_helper.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
src/imgtools/datasets/github_helper.py (2)

95-96: ⚠️ Potential issue

Correct the type annotation for repo

The type annotation for repo is incorrect. Repository should be imported directly from the github module. Update the type annotations accordingly.

Apply this diff to fix the type annotations:

- from github import Github  # type: ignore # noqa
+ from github import Github, Repository  # type: ignore # noqa

  ...

  github: Github
- repo: Github.Repository
+ repo: Repository

284-286: ⚠️ Potential issue

Fix extractall usage and implement safe extraction

The extractall method of tarfile does not accept a filter parameter. Additionally, implement a safe extraction to prevent path traversal vulnerabilities.

Apply this diff to correct the issue and enhance security:

+     import os

      ...

          if tarfile.is_tarfile(path):
              with tarfile.open(path, "r:*") as archive:
-                 archive.extractall(dest, filter="data")
+                 self.safe_extract(archive, dest)
                  extracted_paths.extend(
                      [dest / member.name for member in archive.getmembers()]
                  )
              elif zipfile.is_zipfile(path):

+     def is_within_directory(self, directory: Path, target: Path) -> bool:
+         abs_directory = directory.resolve()
+         abs_target = target.resolve()
+         return abs_directory in abs_target.parents or abs_directory == abs_target
+
+     def safe_extract(self, tar: tarfile.TarFile, path: Path) -> None:
+         for member in tar.getmembers():
+             member_path = path / member.name
+             if not self.is_within_directory(path, member_path):
+                 raise Exception("Attempted Path Traversal in Tar File")
+         tar.extractall(path=path)
🧹 Nitpick comments (2)
src/imgtools/datasets/github_helper.py (1)

11-11: Remove unused import requests

The requests module is imported but not used anywhere in the code. Removing it will clean up the imports.

Apply this diff to remove the unused import:

- import requests
🧰 Tools
🪛 Ruff (0.8.2)

11-11: requests imported but unused

Remove unused import: requests

(F401)

tests/datasets/test_github_helper.py (1)

1-1: Remove unused import Path

The Path class from pathlib is imported but not used in this test module. Removing it will tidy up the imports.

Apply this diff to remove the unused import:

- from pathlib import Path
🧰 Tools
🪛 Ruff (0.8.2)

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e09479d and fd7a21d.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • pyproject.toml (1 hunks)
  • src/imgtools/datasets/__init__.py (1 hunks)
  • src/imgtools/datasets/github_helper.py (1 hunks)
  • tests/datasets/test_github_helper.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/datasets/test_github_helper.py

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

src/imgtools/datasets/github_helper.py

11-11: requests imported but unused

Remove unused import: requests

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Unit-Tests (macos-latest, py312)
🔇 Additional comments (1)
src/imgtools/datasets/__init__.py (1)

1-13: LGTM!

The import statements and the __all__ definition are correctly set up to expose the intended classes. Good job on maintaining a clean and explicit public API.

src/imgtools/datasets/github_helper.py Show resolved Hide resolved
tests/datasets/test_github_helper.py Outdated Show resolved Hide resolved
tests/datasets/test_github_helper.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tests/datasets/test_github_helper.py (4)

8-13: Add error handling to the fixture.

The fixture should verify that the latest release was successfully fetched to provide clearer error messages if GitHub API calls fail.

 @pytest.fixture
 def med_image_test_data():
     manager = MedImageTestData()
-    _ = manager.get_latest_release()
+    latest_release = manager.get_latest_release()
+    if latest_release is None:
+        pytest.skip("Failed to fetch latest release - skipping tests")
     return manager

15-22: Enhance test coverage for release attributes.

Consider adding assertions for additional release attributes and edge cases:

  • Verify that tag_name and name are not empty strings
  • Check for presence of assets list
  • Validate release URL format
 def test_get_latest_release(med_image_test_data):
     latest_release = med_image_test_data.get_latest_release()
     assert latest_release is not None
     assert latest_release.tag_name is not None
     assert latest_release.name is not None
     assert isinstance(latest_release.tag_name, str)
     assert isinstance(latest_release.name, str)
+    assert len(latest_release.tag_name) > 0, "tag_name should not be empty"
+    assert len(latest_release.name) > 0, "name should not be empty"
+    assert hasattr(latest_release, 'assets'), "release should have assets"
+    assert isinstance(latest_release.assets, list), "assets should be a list"
+    if latest_release.html_url:  # if available
+        assert latest_release.html_url.startswith('https://github.com/'), "Invalid release URL"

30-32: Document the asset selection strategy.

The selection of the first 2 assets seems arbitrary. Consider:

  1. Documenting why only 2 assets are chosen
  2. Using a more specific selection criteria
  3. Testing with different numbers of assets
-    chosen_assets = release.assets[:2]
+    # Test with a subset of assets to balance coverage and execution time
+    max_test_assets = 2
+    chosen_assets = release.assets[:max_test_assets]

39-43: Enhance extracted file validation.

The test only verifies file existence and type but doesn't validate the content or structure of extracted files.

Consider adding:

  1. File size checks
  2. Basic content validation
  3. Directory structure verification
     extracted_paths = med_image_test_data.extract(extract_dir)
     assert len(extracted_paths) > 0
     for path in extracted_paths:
         assert path.exists()
         assert path.is_file() or path.is_dir()
+        if path.is_file():
+            # Ensure files are not empty
+            assert path.stat().st_size > 0, f"File is empty: {path}"
+        if path.is_dir():
+            # Ensure directories contain expected structure
+            assert any(path.iterdir()), f"Directory is empty: {path}"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd7a21d and 18bb461.

📒 Files selected for processing (1)
  • tests/datasets/test_github_helper.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/datasets/test_github_helper.py

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Unit-Tests (ubuntu-latest, py312)
  • GitHub Check: Unit-Tests (ubuntu-latest, py311)
  • GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (1)
tests/datasets/test_github_helper.py (1)

44-51: Fix file type checking for multi-extension files.

The current implementation doesn't handle multi-extension files correctly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
.github/workflows/main.yml (1)

Windows testing removal needs reconsideration

The codebase contains significant Windows-specific functionality that should be tested:

  • Path length and character handling in src/imgtools/dicom/sort/path_limits.py
  • Windows-specific dependencies in pixi.lock
  • File system compatibility code in src/imgtools/io/writers/old_writers.py

Removing Windows from the test matrix while maintaining Windows-specific code could lead to undetected compatibility issues. Consider either:

  1. Keeping Windows in the test matrix to ensure continued compatibility, or
  2. Documenting the decision to drop Windows support and gradually removing Windows-specific code
🔗 Analysis chain

Line range hint 19-19: Verify Windows support removal.

The comment indicates windows-latest was removed from the test matrix. Please verify:

  1. Is this a temporary or permanent removal?
  2. Are there any Windows-specific tests or configurations that should be cleaned up?
  3. Should this be documented in the project's compatibility documentation?

Let's check for any Windows-specific configurations or tests:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Windows-specific code or configurations
rg -i "windows|win32|win64|winapi" --type yaml --type python

# Look for OS-specific path handling
ast-grep --pattern 'os.path.join($_,$_)'

Length of output: 484


Script:

#!/bin/bash
# Search for Windows-specific code with correct file types
rg -i "windows|win32|win64|winapi"

# Check workflow file history
git log -p -- .github/workflows/main.yml | head -n 50

# Look for test configurations
rg -i "test.*config|config.*test" --type yaml

Length of output: 5527

🧹 Nitpick comments (4)
src/imgtools/cli/testdatasets.py (1)

20-41: Consider adding validation for the destination directory.

While the Click path type handles basic validation, consider adding checks for:

  • Available disk space
  • Write permissions for parent directories
  • Existing content in the destination directory
tests/datasets/test_github_helper.py (2)

1-1: Remove unused import.

The Path import from pathlib is not used in the code.

-from pathlib import Path
🧰 Tools
🪛 Ruff (0.8.2)

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)


32-42: Consider parameterizing the test with different asset combinations.

Currently, the test uses a hardcoded list of assets. Consider using @pytest.mark.parametrize to test different combinations.

@pytest.mark.parametrize("assets", [
    ["NSCLC-Radiomics"],
    ["NSCLC_Radiogenomics", "Vestibular-Schwannoma-SEG"],
    ["invalid-asset-name"],  # Test error handling
])
def test_extract_with_different_assets(med_image_test_data, tmp_path, assets):
    # Test implementation
.github/workflows/main.yml (1)

36-36: Reconsider test output configuration in CI pipeline.

The addition of -s -vv flags to pytest might impact CI performance:

  • -s disables output capture, which could flood CI logs with print statements when downloading/extracting test data
  • -vv doubles verbosity, potentially creating very large log files

Consider these alternatives:

  1. Remove -s flag and use pytest.mark.filterwarnings for specific tests that need output capture disabled
  2. Use environment variables to control verbosity levels:
-          pixi run -e ${{ matrix.env }} test -s -vv
+          pixi run -e ${{ matrix.env }} test -v
+        env:
+          PYTEST_ADDOPTS: "--capture=no" # Only if absolutely necessary
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18bb461 and f0bc308.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • .github/workflows/main.yml (1 hunks)
  • pixi.toml (4 hunks)
  • src/imgtools/cli/__main__.py (2 hunks)
  • src/imgtools/cli/testdatasets.py (1 hunks)
  • src/imgtools/datasets/github_helper.py (1 hunks)
  • tests/datasets/test_github_helper.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/imgtools/datasets/github_helper.py
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/datasets/test_github_helper.py

1-1: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Unit-Tests (macos-14, py311)
  • GitHub Check: Unit-Tests (macos-14, py310)
  • GitHub Check: Unit-Tests (macos-latest, py312)
  • GitHub Check: Unit-Tests (macos-latest, py311)
  • GitHub Check: Unit-Tests (macos-latest, py310)
  • GitHub Check: Unit-Tests (ubuntu-latest, py312)
  • GitHub Check: Unit-Tests (ubuntu-latest, py311)
  • GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (4)
src/imgtools/cli/__main__.py (1)

8-8: LGTM! Well-structured conditional CLI command registration.

The implementation gracefully handles the optional GitHub dependency by conditionally registering the testdata command only when the required module is available.

Also applies to: 33-36

src/imgtools/cli/testdatasets.py (1)

7-13: LGTM! Clean implementation of module availability check.

The is_testdata_available() function properly handles the optional GitHub dependency.

tests/datasets/test_github_helper.py (1)

56-63: Fix file type checking for multi-extension files.

Using path.suffix only captures the last extension (e.g., .gz in .nii.gz).

pixi.toml (1)

6-6: LGTM! Well-structured dependency management.

The changes:

  • Add proper platform support for osx-64
  • Configure appropriate extras for development and testing
  • Comment out pytest-xdist for future consideration

Also applies to: 26-27, 54-55, 60-60

src/imgtools/cli/testdatasets.py Show resolved Hide resolved
tests/datasets/test_github_helper.py Show resolved Hide resolved
@jjjermiah jjjermiah requested a review from JoshuaSiraj January 22, 2025 17:19
Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 79.25926% with 28 lines in your changes missing coverage. Please review.

Project coverage is 50.69%. Comparing base (8ccc0c0) to head (fdbd4c5).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/imgtools/datasets/github_helper.py 78.94% 28 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #194      +/-   ##
==========================================
+ Coverage   49.55%   50.69%   +1.13%     
==========================================
  Files          45       47       +2     
  Lines        3408     3543     +135     
==========================================
+ Hits         1689     1796     +107     
- Misses       1719     1747      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JoshuaSiraj
Copy link
Collaborator

@jjjermiah noticed that medimgtools doesn't support windows??

@jjjermiah
Copy link
Contributor Author

@jjjermiah noticed that medimgtools doesn't support windows??

trueee

@jjjermiah
Copy link
Contributor Author

@jjjermiah noticed that medimgtools doesn't support windows??

trueee

it was just the pixi configuration, which is only for developers. that being said, we should support it so we can immediately tell if pixi cant solve an environment for it.

@jjjermiah
Copy link
Contributor Author

@jjjermiah noticed that medimgtools doesn't support windows??

added to pixi and the ci/cd now:

https://github.com/bhklab/med-imagetools/actions/runs/12913974856?pr=194

lets see what happens lol

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/imgtools/datasets/github_helper.py (1)

368-370: ⚠️ Potential issue

Fix the usage of extractall and implement safe extraction.

The current implementation is vulnerable to path traversal attacks and uses an incorrect parameter.

Apply this diff to correct the issue and enhance security:

+import os

 ...

     def extract(self, dest: Path) -> List[Path]:
         """Extract downloaded archives to the specified directory."""
         if not self.downloaded_paths:
             raise ValueError(
                 "No archives have been downloaded yet. Call `download_release_data` first."
             )

         extracted_paths = []
         for path in self.downloaded_paths:
             print(f"Extracting {path.name}...")
             if tarfile.is_tarfile(path):
                 with tarfile.open(path, "r:*") as archive:
-                    archive.extractall(dest, filter="data")
+                    self.safe_extract(archive, dest)
                     extracted_paths.extend(
                         [dest / member.name for member in archive.getmembers()]
                     )
             elif zipfile.is_zipfile(path):
                 with zipfile.ZipFile(path, "r") as archive:
                     archive.extractall(dest)
                     extracted_paths.extend([dest / name for name in archive.namelist()])
             else:
                 print(f"Unsupported archive format: {path.name}")
         return extracted_paths

+    def is_within_directory(self, directory: Path, target: Path) -> bool:
+        abs_directory = directory.resolve()
+        abs_target = target.resolve()
+        return abs_directory in abs_target.parents or abs_directory == abs_target
+
+    def safe_extract(self, tar: tarfile.TarFile, path: Path) -> None:
+        for member in tar.getmembers():
+            member_path = path / member.name
+            if not self.is_within_directory(path, member_path):
+                raise Exception("Attempted Path Traversal in Tar File")
+        tar.extractall(path=path)
🧹 Nitpick comments (3)
pixi.toml (1)

60-60: Document the reason for postponing pytest-xdist.

Consider adding a brief explanation in the comment about why pytest-xdist is being postponed and any specific conditions or timeline for its reintroduction.

-# pytest-xdist = "*" re-introduce later. 
+# pytest-xdist = "*" # TODO: Re-introduce when <specific condition> is met. See issue #<number> for details.
.github/workflows/main.yml (1)

20-20: Fix trailing whitespace.

Remove the trailing whitespace at the end of the line.

-        os: [ubuntu-latest, macos-latest, macos-14, windows-latest] 
+        os: [ubuntu-latest, macos-latest, macos-14, windows-latest]
🧰 Tools
🪛 yamllint (1.35.1)

[error] 20-20: trailing spaces

(trailing-spaces)

src/imgtools/datasets/github_helper.py (1)

31-32: Enhance the error message with installation command.

The error message could be more helpful by including the pip command with the exact package name.

-        "PyGithub is required for the test data feature of med-imagetools. "
-        "Install it using 'pip install med-imagetools[test]'."
+        "PyGithub is required for the test data feature of med-imagetools. "
+        "Install it using 'pip install PyGithub' or 'pip install med-imagetools[test]'."
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0bc308 and 19d19f2.

📒 Files selected for processing (3)
  • .github/workflows/main.yml (2 hunks)
  • pixi.toml (4 hunks)
  • src/imgtools/datasets/github_helper.py (1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/main.yml

[error] 20-20: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: Unit-Tests (windows-latest, py312)
  • GitHub Check: Unit-Tests (windows-latest, py311)
  • GitHub Check: Unit-Tests (windows-latest, py310)
  • GitHub Check: Unit-Tests (macos-14, py312)
  • GitHub Check: Unit-Tests (macos-14, py311)
  • GitHub Check: Unit-Tests (macos-14, py310)
  • GitHub Check: Unit-Tests (macos-latest, py311)
  • GitHub Check: Unit-Tests (macos-latest, py310)
  • GitHub Check: Unit-Tests (ubuntu-latest, py312)
  • GitHub Check: Unit-Tests (ubuntu-latest, py311)
  • GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (6)
pixi.toml (2)

6-6: LGTM! Good enhancement of cross-platform support.

The addition of osx-64 and win-64 platforms broadens the project's accessibility to more users.


54-55: Verify the "test" extras group exists in pyproject.toml.

The addition of extras = ["test"] looks good, but we should verify that this extras group is properly defined.

✅ Verification successful

"test" extras group is properly defined

The "test" extras group exists in pyproject.toml with the required dependencies. The configuration in pixi.toml is correct.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the "test" extras group is defined in pyproject.toml
rg -A 5 '\[project\.optional-dependencies\]|\[tool\.poetry\.extras\]' pyproject.toml

Length of output: 199

.github/workflows/main.yml (1)

35-38: LGTM! Good addition of GitHub token for test data access.

The addition of GITHUB_TOKEN environment variable is necessary for the new test data download functionality, and the modified test command with -s -vv flags improves test output visibility.

src/imgtools/datasets/github_helper.py (3)

36-92: LGTM! Well-structured and documented data classes.

The GitHubReleaseAsset and GitHubRelease classes are well-designed with comprehensive docstrings and appropriate type hints.


108-109: Correct the type annotations for github and repo attributes.

The type annotations for github and repo are incorrect. The Repository class is not an attribute of the Github class.

Apply this diff to fix the type annotations:

-from github import Github  # type: ignore # noqa
+from github import Github, Repository  # type: ignore # noqa

...

 github: Github
-repo: Github.Repository
+repo: Repository

361-361: Update the method name in the error message.

The error message references download_release_data, which does not exist.

Apply this diff to correct the error message:

      raise ValueError(
-         "No archives have been downloaded yet. Call `download_release_data` first."
+         "No archives have been downloaded yet. Call `download_all` or `download` first."
      )

Comment on lines +26 to +28
[feature.dev.pypi-dependencies]
med-imagetools = { path = ".", editable = true, extras = ["all"]}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

The "all" extras group is missing in pyproject.toml

The extras = ["all"] specified in pixi.toml cannot be resolved as the "all" extras group is not defined in pyproject.toml. Currently, only "torch" and "test" extras groups are available.

  • Add an "all" extras group in pyproject.toml that includes all optional dependencies, or
  • Use one of the existing extras groups: "torch" or "test"
🔗 Analysis chain

Verify the "all" extras group exists in pyproject.toml.

The addition of extras = ["all"] looks good, but we should verify that this extras group is properly defined.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the "all" extras group is defined in pyproject.toml
rg -A 5 '\[project\.optional-dependencies\]|\[tool\.poetry\.extras\]' pyproject.toml

Length of output: 199

@JoshuaSiraj
Copy link
Collaborator

@jjjermiah noticed that medimgtools doesn't support windows??

trueee

it was just the pixi configuration, which is only for developers. that being said, we should support it so we can immediately tell if pixi cant solve an environment for it.

kk

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tests/dicom/sort/test_sorter_base.py (1)

63-64: Consider removing unnecessary blank lines.

The two consecutive blank lines between class definition and method don't follow PEP 8 style guide, which recommends using a single blank line.

-

-
+
tests/dicom/sort/test_sort_method.py (2)

100-110: Consider consolidating permission restoration.

The permission restoration could be moved to a fixture to ensure cleanup even if the test fails.

+@pytest.fixture
+def temp_dir_with_permissions():
+    with TemporaryDirectory() as tmpdir:
+        path = Path(tmpdir)
+        yield path
+        os.chmod(path, 0o700)  # Always restore permissions

-def test_parent_directory_creation_error(temp_dir) -> None:
+def test_parent_directory_creation_error(temp_dir_with_permissions) -> None:
     source = temp_dir / "source.txt"
     destination = temp_dir / "no_permission_dir/destination.txt"
     source.write_text("Test content")

     os.chmod(temp_dir, 0o400)  # Read-only permission

     with pytest.raises(PermissionError):
         handle_file(source, destination, FileAction.COPY)

-    os.chmod(temp_dir, 0o700)  # Restore permissions

116-128: Consider using a more specific error message pattern.

The error message pattern could be more specific to distinguish between parent directory creation failure and write permission issues.

     with pytest.raises(
-        PermissionError, match="Failed to create parent directory or no write permission"
+        PermissionError, match="^(Failed to create parent directory|No write permission for directory)"
     ):
         handle_file(source, destination, FileAction.MOVE)
pyproject.toml (1)

62-69: Consider version alignment in 'all' group.

The commented-out dependencies should include version constraints when they're added back to maintain consistency with other dependencies in the project.

  # add these back later
- # "torch",
- # "torchio",
- # "pyvis",
+ # "torch>=2.0.0",
+ # "torchio>=0.19.0",
+ # "pyvis>=0.3.0",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19d19f2 and fdbd4c5.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • pyproject.toml (1 hunks)
  • tests/dicom/sort/test_sort_method.py (5 hunks)
  • tests/dicom/sort/test_sorter_base.py (1 hunks)
🧰 Additional context used
📓 Learnings (1)
tests/dicom/sort/test_sort_method.py (1)
Learnt from: jjjermiah
PR: bhklab/med-imagetools#137
File: tests/dicom/sort/test_sort_method.py:71-82
Timestamp: 2024-11-21T21:03:34.864Z
Learning: In the `tests/dicom/sort/test_sort_method.py` file, Unix-specific octal permissions are acceptable for permission testing; platform-independent permission testing is not required.
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: Unit-Tests (windows-latest, py312)
  • GitHub Check: Unit-Tests (windows-latest, py311)
  • GitHub Check: Unit-Tests (windows-latest, py310)
  • GitHub Check: Unit-Tests (macos-14, py312)
  • GitHub Check: Unit-Tests (macos-14, py311)
  • GitHub Check: Unit-Tests (macos-14, py310)
  • GitHub Check: Unit-Tests (macos-latest, py312)
  • GitHub Check: Unit-Tests (macos-latest, py310)
  • GitHub Check: Unit-Tests (ubuntu-latest, py312)
  • GitHub Check: Unit-Tests (ubuntu-latest, py311)
  • GitHub Check: Unit-Tests (ubuntu-latest, py310)
🔇 Additional comments (3)
tests/dicom/sort/test_sorter_base.py (1)

70-78: LGTM! Robust path comparison implementation.

The use of Path.resolve() for path comparisons is a good practice as it handles symbolic links and normalizes paths correctly.

tests/dicom/sort/test_sort_method.py (1)

81-83: LGTM! Proper platform-specific test handling.

The use of @pytest.mark.skipif for Windows is appropriate as it clearly documents the platform limitation.

Also applies to: 97-99, 113-115

pyproject.toml (1)

58-61: LGTM! Well-structured test dependencies.

The addition of pygithub and aiohttp aligns with the PR objective of downloading test data from GitHub releases.

@jjjermiah jjjermiah merged commit 3385682 into main Jan 22, 2025
20 checks passed
@jjjermiah jjjermiah deleted the 193-download-data-from-test-data-repo branch January 22, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Download Data from Test Data Repo
2 participants