Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP] Proposal to Address Preview Issues with Delta Lake and Iceberg Table #31881

Open
tjsdud594 opened this issue Jan 16, 2025 · 0 comments
Open
Labels
design:proposal Design proposals sip Superset Improvement Proposal

Comments

@tjsdud594
Copy link

Please make sure you are familiar with the SIP process documented
here. The SIP will be numbered by a committer upon acceptance.

[SIP] Proposal for ...<title>

Motivation

Link: Issue #26449

I encountered a very similar error to the case described in the issue above.
The only difference is that I am working with a Delta Lake table instead of an Iceberg table.
I would like to share this bug and propose a solution and potential improvement.

How to reproduce the bug

  1. Create a table with partitions in the Trino catalog.
  2. Open SQL Lab
  3. Select the catalog, schema, and table from the drop downs.
  4. You will encounter the error: "trino error: line 5:7: Column 'partition' cannot be resolved"

Superset Environment

  • Superset version: 4.1.1
  • Python version: 3.10.15
  • Trino version: 467

Proposed Change

Solution

Referring to the previous case, I modified the trino.py module as follows:
Module Location: superset/db_engine_specs/trino.py : line 494~end

Before

@classmethod
def get_indexes(
    cls,
    database: Database,
    inspector: Inspector,
    table: Table,
) -> list[dict[str, Any]]:
    """
    Get the indexes associated with the specified schema/table.

    Trino dialect raises NoSuchTableError in get_indexes if table is empty.

    :param database: The database to inspect
    :param inspector: The SQLAlchemy inspector
    :param table: The table instance to inspect
    :returns: The indexes
    """
    try:
        return super().get_indexes(database, inspector, table)
    except NoSuchTableError:
        return []

After

@classmethod
def get_indexes(
    cls,
    database: Database,
    inspector: Inspector,
    table: Table,
) -> list[dict[str, Any]]:
    """
    Get the indexes associated with the specified schema/table.

    Trino dialect raises NoSuchTableError in get_indexes if table is empty.

    :param database: The database to inspect
    :param inspector: The SQLAlchemy inspector
    :param table: The table instance to inspect
    :returns: The indexes
    """
    try:
        indexes = super().get_indexes(database, inspector, table_name, schema)
        # Handle iceberg / delta tables. Even for non-partitioned tables, it returns a value
        cols_ignore = {"file_count", "total_size", "data"}
        if len(indexes) == 1 and indexes[0].get("name") == "partition" and cols_ignore.issubset(set(indexes[0].get("column_names", []))):
            return []
        return indexes
    except NoSuchTableError:
        return []

This modification ensures compatibility with both Iceberg and Delta Lake tables, improving the robustness of the solution.

New or Changed Public Interfaces

Nope

New dependencies

Nope

Migration Plan and Compatibility

Nope

Rejected Alternatives

Nope

@tjsdud594 tjsdud594 added the sip Superset Improvement Proposal label Jan 16, 2025
@dosubot dosubot bot added the design:proposal Design proposals label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design:proposal Design proposals sip Superset Improvement Proposal
Projects
Development

No branches or pull requests

1 participant