Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow lists wherever possible in builtin operators #5241

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Dec 9, 2024

Updates the builtin operators to use list() to allow users to act on multiple fields/entities wherever it makes sense.

Summary by CodeRabbit

  • New Features

    • Enhanced field handling in dataset summary creation
    • Improved input validation for multiple operators
    • Added support for deleting multiple fields, indexes, and workspaces
  • Bug Fixes

    • Fixed nested path handling in field operations
    • Improved error messaging for invalid field inputs
  • Improvements

    • More flexible field name specification
    • Better support for complex field and operator interactions

@brimoor brimoor requested a review from imanjra December 9, 2024 04:57
Copy link
Contributor

coderabbitai bot commented Dec 9, 2024

Walkthrough

This pull request introduces enhancements to multiple components of the FiftyOne library. The changes focus on improving input handling, field management, and aggregation processes across different modules. Key modifications include updating the list field handling in dataset summary creation, adding a utility function to manage nested paths, and refactoring operator input validation to support more flexible field operations with enhanced error handling.

Changes

File Change Summary
fiftyone/core/dataset.py Modified _populate_summary_field to iterate through multiple list fields during aggregation
fiftyone/core/odm/mixins.py Added _remove_nested_paths() function to filter out nested paths in field operations
plugins/operators/__init__.py Extensive refactoring of operator input handling:
- Converted field name inputs to support strings and multiple fields
- Enhanced input validation and error messaging
- Added support for deleting multiple workspaces, indexes, and fields

Possibly related PRs

Suggested labels

feature

Suggested reviewers

  • minhtuev
  • CamronStaley

Poem

🐰 Hopping through code with glee,
Nested paths now set us free!
Operators dance, fields align,
Flexibility's our design divine!
FiftyOne leaps with rabbit might! 🚀

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

fiftyone/operators/builtin.py Outdated Show resolved Hide resolved
fiftyone/operators/builtin.py Outdated Show resolved Hide resolved
fiftyone/operators/builtin.py Outdated Show resolved Hide resolved
fiftyone/operators/builtin.py Outdated Show resolved Hide resolved
fiftyone/operators/builtin.py Outdated Show resolved Hide resolved
fiftyone/operators/builtin.py Outdated Show resolved Hide resolved
fiftyone/operators/builtin.py Outdated Show resolved Hide resolved
fiftyone/operators/builtin.py Outdated Show resolved Hide resolved
fiftyone/operators/builtin.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)
fiftyone/operators/builtin.py (5)

250-264: Refactor repeated field validation logic into a helper function

The validation code for checking if a field exists in the schema and handling error messages is duplicated across multiple functions (_clone_sample_field_inputs, _clone_frame_field_inputs, _rename_sample_field_inputs, _rename_frame_field_inputs). To adhere to the DRY (Don't Repeat Yourself) principle and improve maintainability, consider refactoring this logic into a reusable helper function.

Also applies to: 368-382, 445-459, 537-551


650-650: Simplify loop by iterating directly over the dictionary

You can iterate directly over the dictionary schema instead of using schema.keys(). This is more Pythonic and efficient.

Apply this change:

- for key in schema.keys():
+ for key in schema:
🧰 Tools
🪛 Ruff (0.8.0)

650-650: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


758-758: Simplify loop by iterating directly over the dictionary

Same improvement as in line 650; iterate directly over schema for simplicity.

Apply this change:

- for key in schema.keys():
+ for key in schema:
🧰 Tools
🪛 Ruff (0.8.0)

758-758: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


910-910: Simplify loop by iterating directly over the dictionary

Consistently iterate directly over schema without calling keys().

Apply this change:

- for key in schema.keys():
+ for key in schema:
🧰 Tools
🪛 Ruff (0.8.0)

910-910: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


985-985: Simplify loop by iterating directly over the dictionary

Adjust the loop to iterate over schema directly for cleaner code.

Apply this change:

- for key in schema.keys():
+ for key in schema:
🧰 Tools
🪛 Ruff (0.8.0)

985-985: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

fiftyone/core/odm/mixins.py (1)

1872-1878: Add a docstring to _remove_nested_paths function

Adding a docstring to the _remove_nested_paths(paths) function will enhance code readability by explaining its purpose and usage. This is helpful for future maintainability and for other developers who may work with the code.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8820555 and 6082a67.

📒 Files selected for processing (3)
  • fiftyone/core/dataset.py (1 hunks)
  • fiftyone/core/odm/mixins.py (4 hunks)
  • fiftyone/operators/builtin.py (21 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
fiftyone/operators/builtin.py

650-650: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


758-758: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


910-910: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


985-985: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (2)
fiftyone/operators/builtin.py (1)

656-656: Use types.String() instead of types.OneOf([...])

Consider using types.String() directly instead of types.OneOf([types.Object(), types.String()]) for the input type. This simplifies the code and removes the need for _to_string_list() usage.

Also applies to: 764-764, 916-916, 991-991

fiftyone/core/dataset.py (1)

1971-1972: LGTM! This fixes list field handling in the aggregation pipeline.

The change correctly iterates over all list fields to unwind them in the pipeline, rather than only handling the first list field. This ensures proper population of summary fields when multiple list fields are present.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
fiftyone/core/odm/mixins.py (1)

1872-1878: Add docstring and type hints to improve code clarity.

The function implementation is correct, but would benefit from documentation:

-def _remove_nested_paths(paths):
+def _remove_nested_paths(paths: list[str]) -> list[str]:
+    """Remove any paths that are nested within other paths in the list.
+    
+    Args:
+        paths: List of field paths
+        
+    Returns:
+        Filtered list containing only top-level paths
+        
+    Example:
+        >>> _remove_nested_paths(['a', 'a.b', 'c'])
+        ['a', 'c']
+    """
     return [
         path
         for path in paths
         if not any(path.startswith(p + ".") for p in paths)
     ]
fiftyone/operators/builtin.py (3)

2442-2447: Add docstring and type hints to improve code clarity.

The function implementation is correct, but would benefit from documentation:

-def _to_string_list(values):
+def _to_string_list(values: list[Union[dict, str]]) -> list[str]:
+    """Convert a list of values to a list of strings.
+    
+    Args:
+        values: List of values that can be strings or dicts with 'value' key
+        
+    Returns:
+        List of strings extracted from the input values
+        
+    Example:
+        >>> _to_string_list([{'value': 'a'}, 'b'])
+        ['a', 'b']
+    """
     if not values:
         return []

     return [d["value"] if isinstance(d, dict) else d for d in values]

650-650: Optimize dictionary key lookup operations.

Use key in dict instead of key in dict.keys() for better performance:

-    for key in schema.keys():
+    for key in schema:

Also applies to: 758-758, 910-910, 985-985

🧰 Tools
🪛 Ruff (0.8.2)

650-650: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


1822-1823: Track and fix the App bug affecting view setting.

The TODO comment indicates a known issue that prevents proper view setting functionality.

Would you like me to help create a GitHub issue to track this bug? I can include:

  • Description of the bug
  • Impact on functionality
  • Suggested investigation steps
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6082a67 and b59bcee.

📒 Files selected for processing (3)
  • fiftyone/core/dataset.py (1 hunks)
  • fiftyone/core/odm/mixins.py (4 hunks)
  • fiftyone/operators/builtin.py (21 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • fiftyone/core/dataset.py
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/operators/builtin.py

650-650: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


758-758: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


910-910: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


985-985: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (2)
fiftyone/core/odm/mixins.py (2)

673-673: LGTM! Correctly filters nested paths before field processing.

The integration of _remove_nested_paths ensures efficient field clearing by preventing redundant operations on nested fields.


721-721: LGTM! Consistent implementation across field operations.

The _remove_nested_paths function is correctly integrated in both methods, maintaining consistent behavior for nested field handling.

Also applies to: 826-826

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
plugins/operators/__init__.py (9)

268-268: Clarify the purpose of the new field
The description for "new_field_name" is clear. Just ensure that the user-facing label or any documentation references the reason for creating this new field for better clarity.


365-379: Reuse existing validation approach
This section mirrors the clone logic for frame fields. Reusing or unifying the validation approach for sample vs. frame fields might reduce duplication and simplify long-term maintenance.


611-626: Parameterizing the clear operation target
The radio group design is consistent with other operators, letting users quickly choose which subset of samples to modify. However, consider extracting the repeated “target” logic (dataset/current view/selected samples) into a shared helper to avoid duplicative code across operators.


644-650: Avoid .keys() usage
Use for key in schema: instead of for key in schema.keys(): to adhere to Python best practices and pass static analysis.

- for key in schema.keys():
+ for key in schema:
🧰 Tools
🪛 Ruff (0.8.2)

647-647: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


692-697: Frame fields clearing
Mirrors the sample fields clearing logic, ensuring frame-level data can also be cleared in bulk. Consider factoring out common patterns for maintainability.


719-734: Radio button grouping for frame clearing
Similar design to sample clearing. The code is consistent but duplicative across sample and frame operators; recommended to unify shared logic.


752-758: Inline keys iteration
Similarly, use for key in schema: to remove .keys().

- for key in schema.keys():
+ for key in schema:
🧰 Tools
🪛 Ruff (0.8.2)

755-755: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


904-910: Avoid .keys() usage
Minor performance/readability nitpick: remove .keys().

- for key in schema.keys():
+ for key in schema:
🧰 Tools
🪛 Ruff (0.8.2)

907-907: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


979-985: Iterating schema keys
As with prior spots, remove .keys() to meet static analysis guidance.

- for key in schema.keys():
+ for key in schema:
🧰 Tools
🪛 Ruff (0.8.2)

982-982: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b59bcee and c8355bb.

📒 Files selected for processing (3)
  • fiftyone/core/dataset.py (1 hunks)
  • fiftyone/core/odm/mixins.py (4 hunks)
  • plugins/operators/__init__.py (21 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • fiftyone/core/dataset.py
🧰 Additional context used
🪛 Ruff (0.8.2)
plugins/operators/__init__.py

647-647: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


755-755: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


907-907: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


982-982: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (26)
plugins/operators/__init__.py (22)

247-261: Validate field existence before cloning
The code correctly checks whether the specified field name exists in the schema and marks the input as invalid if not found. This ensures that users receive immediate feedback on typos or non-existent fields.


442-456: Read-only field rename checks
Great job restricting renaming of read-only fields. This prevents unintended modifications of critical data. Ensure that any attempts to rename system fields (like _id) are also blocked if they appear.


593-598: Multiple sample fields clearing
Allowing multiple fields to be cleared at once provides significant flexibility. The code properly retrieves them via _to_string_list and applies the operation.


651-657: List input usage
Replacing traditional single-string fields with inputs.list(...) for multiple field deletion is well-structured. It aligns with the PR objective to “Allow lists wherever possible.”


660-664: Early return for invalid fields
Returning immediately when a field is invalid or does not exist is a good UX practice. This short-circuits further processing and provides a clear error message.


666-670: Respect read-only fields
Rejecting attempts to clear read-only fields maintains data integrity. This is consistent with earlier constraints preventing edits to read-only fields.


759-765: Accept multiple frame fields
Switching to inputs.list(...) to accept multiple frame fields for clearing is consistent with the multi-field approach.


768-782: Validate frame fields
These checks parallel the sample field logic, ensuring non-existent and read-only frame fields are handled properly.


886-888: Delete multiple sample fields
Good move making “DeleteSampleField” accept multiple fields in a single request. This helps streamline user workflow.


911-917: List of fields to delete
Leveraging inputs.list(...) again. This consistent approach aligns well with your multi-field design.


920-930: Immediate returns for invalid or read-only fields
Clearly marks invalid fields or read-only fields, preventing any partial or half-complete operations.


952-954: Delete multiple frame fields
Matching sample-level deletion logic. The uniform approach across both sample and frame ensures consistent user experience.


986-992: Frame field(s) to delete
Using “multiple=True” for the autocomplete is user-friendly, matching the multi-field paradigm.


995-1009: User feedback on invalid or read-only frame fields
Comprehensive checks are in place to prevent accidental deletions of important fields.


1124-1176: DropIndex operator
The use of _drop_index_inputs and the acceptance of multiple index names parallels the multi-field approach. Good consistency.


1452-1500: DeleteSummaryField: handling multiple fields
Clear approach to removing multiple summary fields at once. The thorough validation ensures no partial or unexpected deletions.


1639-1653: DeleteGroupSlice approach
Allowing multiple group slices to be deleted helps unify the user experience for multi-item operations. The reversion to the default slice if the current slice is removed is a useful fallback.


1659-1693: _delete_group_slice_inputs
Mirrors other multi-delete patterns, verifying that group slices exist and returning early if they do not. Good user feedback.


1929-1941: DeleteSavedView operator
Adopting the multi-delete pattern here as well. This is consistent with the overall PR objective of allowing lists for operated items.


1942-1974: _delete_saved_view_inputs logic
The usage of auto-complete with multiple selection is consistent across the various multi-delete features. Well done.


2224-2276: DeleteWorkspace operator
Again, the multi-workspace delete approach is consistent. Notably, reverting to a default workspace if the current workspace is removed is very user-friendly.


2439-2445: _to_string_list utility
Converts dict-encoded input or raw strings to lists, simplifying the repeated pattern of multi-field inputs. This is a neat approach to unify input types.

fiftyone/core/odm/mixins.py (4)

673-674: Remove nested paths for cleared fields
Excluding nested paths avoids partial or conflicting operations on subpaths. This step ensures integrity when clearing high-level fields.


721-722: Delete fields safely
Ensuring nested paths are stripped out before deletion helps avoid confusion or partial field states.


826-827: Remove dynamic fields correctly
Filtering nested subfields first keeps the schema consistent and avoids partial embedded field definitions.


1872-1878: _remove_nested_paths
This is a concise and effective helper. One consideration: if the input list is large, repeatedly calling path.startswith(p + ".") in a double loop may be costly, but likely not an immediate concern for typical usage.


field_prop = inputs.list(
"field_names",
types.OneOf([types.Object(), types.String()]),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally can make this just types.String() and remove _to_string_list() usage


field_prop = inputs.list(
"field_names",
types.OneOf([types.Object(), types.String()]),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally can make this just types.String() and remove _to_string_list() usage


field_prop = inputs.list(
"field_names",
types.OneOf([types.Object(), types.String()]),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally can make this just types.String() and remove _to_string_list() usage


field_prop = inputs.list(
"field_names",
types.OneOf([types.Object(), types.String()]),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally can make this just types.String() and remove _to_string_list() usage

ctx.dataset.drop_index(index_name)
index_names = _to_string_list(ctx.params.get("index_names", []))

index_selector = types.AutocompleteView(multiple=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally can make this DropdownView(multiple=True) to avoid manually enforcing valid values below


field_names = _to_string_list(ctx.params.get("field_names", []))

field_selector = types.AutocompleteView(multiple=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally can make this DropdownView(multiple=True) to avoid manually enforcing valid values below


names = _to_string_list(ctx.params.get("names", []))

saved_view_selector = types.AutocompleteView(multiple=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally can make this DropdownView(multiple=True) to avoid manually enforcing valid values below


names = _to_string_list(ctx.params.get("names", []))

workspace_selector = types.AutocompleteView(multiple=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally can make this DropdownView(multiple=True) to avoid manually enforcing valid values below


names = _to_string_list(ctx.params.get("names", []))

slice_selector = types.AutocompleteView(multiple=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally can make this DropdownView(multiple=True) to avoid manually enforcing valid values below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants