Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Use Unknown | T_inferred for undeclared public symbols #15674

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Jan 22, 2025

Summary

Use Unknown | T_inferred as the type for undeclared public symbols.

Test Plan

Updated tests.

@sharkdp sharkdp added the red-knot Multi-file analysis & type inference label Jan 22, 2025

This comment was marked as resolved.

@@ -139,7 +139,7 @@ reveal_type(not AlwaysFalse())

# We don't get into a cycle if someone sets their `__bool__` method to the `bool` builtin:
class BoolIsBool:
__bool__ = bool
__bool__: type[bool] = bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently a workaround to avoid running into #15672

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that information could be recorded in a TODO comment here?

Comment on lines -255 to -256
# TODO: this should be `Unknown | Literal[1]`.
reveal_type(C.variable_with_class_default2) # revealed: Literal[1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual TODO I attempted to resolve!

Comment on lines +265 to +266
# TODO: this could be `int` if we declare `B.__add__` using a `Callable` type
reveal_type(B() + B()) # revealed: Unknown | int
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one example of a particularly annoying case that also comes up with __iter__ = …, __bool__ = …, __enter__ = …, __exit__ = …, etc.

The symbol __add__ is not declared here, so we end up inferring Unknown | A for it, which results in Unknown | int when called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too worried about this, because almost always these will be normal methods defined within the class (which is a declaration), not assigned like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if there was an Infer annotation we could use for "just infer the type for me and use it as the public type" :/ Callable annotations in Python are, um, not ergonomic :(

However, realistically: it's very unusual to use non-function instances as methods like this. This kind of thing is quite common:

class Foo:
    def __or__(self, other) -> int:
        return 42

    __ror__ = __or__

But I think that will still be fine with this change, since functions have the same declared type as their inferred type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we'd have a problem with __ror__ in that example, because it's not declared (__or__ is).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we'd have a problem with __ror__ in that example, because it's not declared (__or__ is).

Yes. It would be the exact same case as with __add__ here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see, thanks. This seems bad :( I just don't know how we'd explain this to users

@@ -52,7 +52,7 @@ class NonCallable:
__call__ = 1

a = NonCallable()
# error: "Object of type `NonCallable` is not callable"
# error: "Object of type `Unknown | Literal[1]` is not callable (due to union element `Literal[1]`)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another instance of the callable problem. Notice how the usefulness of the error message degrades.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is really begging for nested diagnostics (so you get both "Object of type NonCallable is not callable" and nested details explaining why.)

We could easily adjust the handling so you still get "Object of type NonCallable is not callable" here, but then you lose the details of which union element failed callability.

Even without nested diagnostics we could still provide all of the information here in a new error, it just requires more and more bespoke error-cases handling in the __call__ lookup code. (We would need to match on the case that callability of __call__ failed due to a union, and then write a new custom error message for that case which mentions both the outer NonCallable type and the details of why its __call__ isn't callable.) Nested diagnostics would just let us handle this kind of situation more generically, with less special casing.

cc @BurntSushi @MichaReiser re diagnostics considerations

@@ -99,9 +99,9 @@ def _(x: str | int):
class A: ...
class B: ...

alias_for_type = type
Copy link
Contributor Author

@sharkdp sharkdp Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not annotate this alias as Literal[type], as that can't be spelled (except using knot_extensions.TypeOf, which I didn't want to use here).

type[type] is too broad for the current narrowing code which pattern-matches on ClassLiteral("type")

Comment on lines +265 to +266
# TODO: this could be `int` if we declare `B.__add__` using a `Callable` type
reveal_type(B() + B()) # revealed: Unknown | int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too worried about this, because almost always these will be normal methods defined within the class (which is a declaration), not assigned like this.

@@ -52,7 +52,7 @@ class NonCallable:
__call__ = 1

a = NonCallable()
# error: "Object of type `NonCallable` is not callable"
# error: "Object of type `Unknown | Literal[1]` is not callable (due to union element `Literal[1]`)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is really begging for nested diagnostics (so you get both "Object of type NonCallable is not callable" and nested details explaining why.)

We could easily adjust the handling so you still get "Object of type NonCallable is not callable" here, but then you lose the details of which union element failed callability.

Even without nested diagnostics we could still provide all of the information here in a new error, it just requires more and more bespoke error-cases handling in the __call__ lookup code. (We would need to match on the case that callability of __call__ failed due to a union, and then write a new custom error message for that case which mentions both the outer NonCallable type and the details of why its __call__ isn't callable.) Nested diagnostics would just let us handle this kind of situation more generically, with less special casing.

cc @BurntSushi @MichaReiser re diagnostics considerations

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! This makes it clear what the impact of this change would be. I agree that the most unintuitive thing is that we'd infer Unknown unions when local scopes reference types from enclosing scopes:

X = 1

def foo():
    reveal_type(X)  # Unknown | Literal[1]  :(

This is probably more correct, though? Because X might be mutated by other modules "monkey-patching" this module's globals.

Comment on lines +265 to +266
# TODO: this could be `int` if we declare `B.__add__` using a `Callable` type
reveal_type(B() + B()) # revealed: Unknown | int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if there was an Infer annotation we could use for "just infer the type for me and use it as the public type" :/ Callable annotations in Python are, um, not ergonomic :(

However, realistically: it's very unusual to use non-function instances as methods like this. This kind of thing is quite common:

class Foo:
    def __or__(self, other) -> int:
        return 42

    __ror__ = __or__

But I think that will still be fine with this change, since functions have the same declared type as their inferred type?

Comment on lines 174 to +178
reveal_type(len(Auto())) # revealed: int
reveal_type(len(Int())) # revealed: Literal[2]
reveal_type(len(Int())) # revealed: int
reveal_type(len(Str())) # revealed: int
reveal_type(len(Tuple())) # revealed: int
reveal_type(len(IntUnion())) # revealed: Literal[2, 32]
reveal_type(len(IntUnion())) # revealed: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, um... it's not your fault, but I think this test should definitely have some comments next to it saying how we don't really support enums at all yet 😄 I think I can see why the result changes here, but it seems incorrect that we allow Literal[SomeEnum.INT] at all as an annotation given that we don't yet infer the correct type for SomeEnum.INT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also seem to remember that we said in the original PR that those test cases are not particularly important, so I didn't bother to fix them / add a TODO. Let me know if you think otherwise.

@sharkdp sharkdp force-pushed the david/union-of-unknown-and-inferred-type branch from cd9230f to c19648a Compare January 23, 2025 12:52
sharkdp added a commit that referenced this pull request Jan 23, 2025
…ference (#15690)

## Summary

Make the remaining `infer.rs` unit tests independent from public symbol
type inference decisions (see upcoming change in #15674).

## Test Plan

- Made sure that the unit tests actually fail if one of the
  `assert_type` assertions is changed.
@sharkdp sharkdp force-pushed the david/union-of-unknown-and-inferred-type branch from c19648a to ded8625 Compare January 23, 2025 13:32
sharkdp added a commit that referenced this pull request Jan 23, 2025
…ence (#15691)

## Summary

Another small PR to focus #15674 solely on the relevant changes. This
makes our Markdown tests less dependent on precise types of public
symbols, without actually changing anything semantically in these tests.

Best reviewed using ignore-whitespace-mode.

## Test Plan

Tested these changes on `main` and on the branch from #15674.
@sharkdp sharkdp force-pushed the david/union-of-unknown-and-inferred-type branch from ded8625 to 026dbd3 Compare January 23, 2025 13:54
@@ -43,7 +43,7 @@ class IntIterable:
def __iter__(self) -> IntIterator:
return IntIterator()

# revealed: tuple[int, int]
# revealed: tuple[int, Unknown | int]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every comprehension has its own scope, so we also look up y as a public symbol 🫤

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is something @AlexWaygood wanted to look into, if I'm not mistaken (after outcome)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for nested scopes that are known to execute eagerly (list/set/dict comprehensions, class bodies), and for ones that probably execute eagerly (generator expressions), we should model the name lookup as a "use" in the outer scope, at that point in control flow of the outer scope, rather than as a public type lookup. Alex has WIP on this, and it should fix this TODO and the one below.

@sharkdp sharkdp force-pushed the david/union-of-unknown-and-inferred-type branch from 026dbd3 to 5bf3c9b Compare January 23, 2025 15:05
@sharkdp sharkdp marked this pull request as ready for review January 23, 2025 15:10
@sharkdp sharkdp requested a review from MichaReiser as a code owner January 23, 2025 15:10
Comment on lines 92 to 97
// We special-case known-instance types here since symbols like `typing.Any` are typically
// not declared in the stubs (e.g. `Any = object()`), but we still want to treat them as
// such.
let is_known_instance = inferred
.ignore_possibly_unbound()
.is_some_and(|ty| matches!(ty, Type::KnownInstance(_)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a questionable choice. I don't know what else to do though. The problem appears with typing.Any, typing.List, typing.Union, …, knot_extensions.Unknown, knot_extensions.AlwaysTrue.

@@ -106,8 +111,8 @@ if flag():
```py
from mod import x, y

reveal_type(x) # revealed: Literal[1] | Any
reveal_type(y) # revealed: Literal[2] | Unknown
reveal_type(x) # revealed: Unknown | Literal[1] | Any
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably cancel one of the dynamic types here? We might have a ticket for that. I'll note it down as a follow up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a TODO in Type::is_gradual_equivalent_to pointing out that not doing this causes problems... but I don't know that we have a ticket

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should do that, but it will also make this test less clearly testing what it is supposed to test. The reason we use x: Any here is that it a) doesn't disappear in union with Literal[1] but b) also doesn't cause an invalid-declaration diagnostic.

But we can discuss how to handle this in the follow up PR.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I would like to understand where Unknown is coming from in the cases I commented inline; otherwise this looks good to go.

@@ -96,7 +96,7 @@ def _(flag: bool):
f = Foo()
f += "Hello, world!"

reveal_type(f) # revealed: int | str
reveal_type(f) # revealed: Unknown | int | str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand where the new Unknown is coming from here? We are revealing the type of f locally within the scope, so it shouldn't be due to modifiability of f. And both __add__ and __iadd__ are declared in the body of Foo (because function definition statements are declarations), so I don't think either of those should be unioned with Unknown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__iadd__ is possibly undeclared.

I'm not 100% clear on the semantics of __add__ and __iadd__, so I suspect that you think that this possibly-undeclaredness should be absorbed by the definite-declaredness of __add__?

I'll look into that tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I said off the top of my head that we should add Unknown to the union for possibly-undeclared, because the "declared type" in the undeclared path is Unknown. But now I'm wondering if that's just wrong. An external assignment that violates the possibly-declared type would effectively be creating a conflicting-declarations situation, and we currently error on conflicting declarations, so it seems like we should also error on that assignment. Which we would do, if we didn't union with Unknown for possibly-undeclared.

@@ -116,7 +116,7 @@ def _(flag1: bool, flag2: bool):
f = 42.0
f += 12

reveal_type(f) # revealed: int | str | float
reveal_type(f) # revealed: Unknown | int | str | float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question; why does Unknown show up here now?

@@ -160,5 +160,5 @@ def f(flag: bool, flag2: bool):
f = Bar()
f += 12

reveal_type(f) # revealed: int | str | float
reveal_type(f) # revealed: Unknown | int | str | float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same question here.

@@ -106,8 +111,8 @@ if flag():
```py
from mod import x, y

reveal_type(x) # revealed: Literal[1] | Any
reveal_type(y) # revealed: Literal[2] | Unknown
reveal_type(x) # revealed: Unknown | Literal[1] | Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should do that, but it will also make this test less clearly testing what it is supposed to test. The reason we use x: Any here is that it a) doesn't disappear in union with Literal[1] but b) also doesn't cause an invalid-declaration diagnostic.

But we can discuss how to handle this in the follow up PR.

@@ -43,7 +43,7 @@ class IntIterable:
def __iter__(self) -> IntIterator:
return IntIterator()

# revealed: tuple[int, int]
# revealed: tuple[int, Unknown | int]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for nested scopes that are known to execute eagerly (list/set/dict comprehensions, class bodies), and for ones that probably execute eagerly (generator expressions), we should model the name lookup as a "use" in the outer scope, at that point in control flow of the outer scope, rather than as a public type lookup. Alex has WIP on this, and it should fix this TODO and the one below.

@@ -139,7 +139,7 @@ reveal_type(not AlwaysFalse())

# We don't get into a cycle if someone sets their `__bool__` method to the `bool` builtin:
class BoolIsBool:
__bool__ = bool
__bool__: type[bool] = bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that information could be recorded in a TODO comment here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants