-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[airflow
] Update AIR302
to check for deprecated context keys
#15144
[airflow
] Update AIR302
to check for deprecated context keys
#15144
Conversation
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
ANN201 | 4 | 2 | 2 | 0 | 0 |
Linter (preview)
ℹ️ ecosystem check detected linter changes. (+4 -3 violations, +0 -0 fixes in 2 projects; 53 projects unchanged)
apache/airflow (+1 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL
+ providers/tests/system/papermill/example_papermill_remote_verify.py:44:37: AIR302 `execution_date` is removed in Airflow 3.0
apache/superset (+3 -3 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL
- tests/integration_tests/charts/schema_tests.py:59:9: PLR6301 Method `test_query_context_series_limit` could be a function, class method, or static method + tests/integration_tests/charts/schema_tests.py:59:9: PLR6301 Method `test_query_context_series_limit` could be a function, class method, or static method - tests/integration_tests/model_tests.py:202:9: PLR6301 Method `test_adjust_engine_params_mysql` could be a function, class method, or static method + tests/integration_tests/model_tests.py:202:9: PLR6301 Method `test_adjust_engine_params_mysql` could be a function, class method, or static method + tests/integration_tests/viz_tests.py:899:9: ANN201 Missing return type annotation for public function `test_apply_rolling_without_data` - tests/integration_tests/viz_tests.py:899:9: ANN201 Missing return type annotation for public function `test_apply_rolling_without_data`
Changes by rule (3 rules affected)
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
PLR6301 | 4 | 2 | 2 | 0 | 0 |
ANN201 | 2 | 1 | 1 | 0 | 0 |
AIR302 | 1 | 1 | 0 | 0 | 0 |
89ed2ce
to
c3fb996
Compare
...rc/rules/airflow/snapshots/ruff_linter__rules__airflow__tests__AIR302_AIR302_context.py.snap
Outdated
Show resolved
Hide resolved
5c96f89
to
5103ef7
Compare
crates/ruff_linter/resources/test/fixtures/airflow/AIR302_context.py
Outdated
Show resolved
Hide resolved
d580a4b
to
c0a34d3
Compare
62de813
to
fb8b300
Compare
For pattern 4, we should use a separate entrypoint which directly checks the function definition instead of trying to find the function definition from the expression otherwise we'll end up with duplicate diagnostics like in the following example: from airflow.decorators import task
@task
def some_task(execution_date, conf, **kwargs):
kwargs.get("execution_date")
kwargs.get("conf") Output of running Ruff on this PR:
Can you make this change? I tried pushing it but I'm unable to do so, do you have "allow edits to maintainers off" (https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork) ? Here's a patch for that: commit c76e15a45d61a42d95a4fbaf23aceae9134652db
Author: Dhruv Manilawala <dhruvmanila@gmail.com>
Date: Thu Jan 23 14:13:29 2025 +0530
Check context parameters directly from function definition
diff --git a/crates/ruff_linter/src/checkers/ast/analyze/statement.rs b/crates/ruff_linter/src/checkers/ast/analyze/statement.rs
index d5f92b9f2..7f64be4bd 100644
--- a/crates/ruff_linter/src/checkers/ast/analyze/statement.rs
+++ b/crates/ruff_linter/src/checkers/ast/analyze/statement.rs
@@ -376,6 +376,9 @@ pub(crate) fn statement(stmt: &Stmt, checker: &mut Checker) {
if checker.enabled(Rule::PytestParameterWithDefaultArgument) {
flake8_pytest_style::rules::parameter_with_default_argument(checker, function_def);
}
+ if checker.enabled(Rule::Airflow3Removal) {
+ airflow::rules::removed_in_3_function_def(checker, function_def);
+ }
}
Stmt::Return(_) => {
if checker.enabled(Rule::ReturnOutsideFunction) {
diff --git a/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs b/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs
index b08f8986b..f36dd29db 100644
--- a/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs
+++ b/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs
@@ -168,6 +168,36 @@ pub(crate) fn removed_in_3(checker: &mut Checker, expr: &Expr) {
}
}
+/// AIR302
+pub(crate) fn removed_in_3_function_def(checker: &mut Checker, function_def: &StmtFunctionDef) {
+ if !checker.semantic().seen_module(Modules::AIRFLOW) {
+ return;
+ }
+
+ if !is_airflow_task(function_def, checker.semantic()) {
+ return;
+ }
+
+ for param in function_def
+ .parameters
+ .posonlyargs
+ .iter()
+ .chain(function_def.parameters.args.iter())
+ .chain(function_def.parameters.kwonlyargs.iter())
+ {
+ let param_name = param.parameter.name.as_str();
+ if REMOVED_CONTEXT_KEYS.contains(¶m_name) {
+ checker.diagnostics.push(Diagnostic::new(
+ Airflow3Removal {
+ deprecated: param_name.to_string(),
+ replacement: Replacement::None,
+ },
+ param.parameter.name.range(),
+ ));
+ }
+ }
+}
+
#[derive(Debug, Eq, PartialEq)]
enum Replacement {
None,
@@ -398,37 +428,6 @@ fn check_context_get(checker: &mut Checker, call_expr: &ExprCall) {
if attr.as_str() != "get" {
return;
}
- let function_def = {
- let mut parents = checker.semantic().current_statements();
- parents.find_map(|stmt| {
- if let Stmt::FunctionDef(func_def) = stmt {
- Some(func_def.clone())
- } else {
- None
- }
- })
- };
-
- if let Some(func_def) = function_def {
- for param in func_def
- .parameters
- .posonlyargs
- .iter()
- .chain(func_def.parameters.args.iter())
- .chain(func_def.parameters.kwonlyargs.iter())
- {
- let param_name = param.parameter.name.as_str();
- if REMOVED_CONTEXT_KEYS.contains(¶m_name) {
- checker.diagnostics.push(Diagnostic::new(
- Airflow3Removal {
- deprecated: param_name.to_string(),
- replacement: Replacement::None,
- },
- param.parameter.name.range(),
- ));
- }
- }
- }
for removed_key in REMOVED_CONTEXT_KEYS {
if let Some(argument) = call_expr.arguments.find_argument_value(removed_key, 0) {
@@ -1087,3 +1086,14 @@ fn is_airflow_builtin_or_provider(segments: &[&str], module: &str, symbol_suffix
_ => false,
}
}
+
+/// Returns `true` if the given function is decorated with `@airflow.decorators.task`.
+fn is_airflow_task(function_def: &StmtFunctionDef, semantic: &SemanticModel) -> bool {
+ function_def.decorator_list.iter().any(|decorator| {
+ semantic
+ .resolve_qualified_name(map_callable(&decorator.expression))
+ .is_some_and(|qualified_name| {
+ matches!(qualified_name.segments(), ["airflow", "decorators", "task"])
+ })
+ })
+} |
@dhruvmanila changed it. |
add lint rule to show error for removed context variables in airflow
e844568
to
c2e6cb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one last doubt but otherwise this looks good.
For a @task
decorated functions, we check for both var.get(...)
and var[...]
pattern but there are other places where we only check subscript expressions. Is that expected? Should those places also check for var.get(...)
pattern? If yes, are there additional requirements in those places similar to how there's a requirement for a @task
decorator?
class CustomOperator(BaseOperator): | ||
def execute(self, context): | ||
execution_date = context["execution_date"] | ||
next_ds = context["next_ds"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check for context.get
pattern in here as well? I think that's not being checked as per my local testing. For example, the following will not raise any diagnostics:
from airflow.models.baseoperator import BaseOperator
class CustomOperator(BaseOperator):
def execute(self, context):
execution_date = context.get("execution_date")
next_ds = context.get("next_ds")
Now, as per:
2. The
execute
function of a BaseOperator subclass (...)
It seems that it's not any method but rather a specific method and only when it's inherited by BaseOperator
. I don't think we're checking this. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think we'll need to check this
- any function that is called in
BaseOperator.execute
. It's possible to have false positive now but the chance are low I think
def any_func(context):
context.get("execution_date")
class CustomOperator(BaseOperator):
def execute(self, context):
any_func(context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Added the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. any function that is called in
BaseOperator.execute
. It's possible to have false positive now but the chance are low I think
Yeah, I don't think we're checking this. I'm fine merging this as is for now but one solution might be to collect all the functions that occur in this context and then defer checking of them. But, this will make it a bit complex and might not be worth it.
@dhruvmanila I have added a check for both |
task1 = DummyOperator( | ||
task_id="task1", | ||
params={ | ||
# Removed variables in template | ||
"execution_date": "{{ execution_date }}", | ||
"next_ds": "{{ next_ds }}", | ||
"prev_ds": "{{ prev_ds }}" | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this in the snapshot. Should we remove this test case?
(I'll do it in a follow-up PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Welcome to Ruff! 🥳
airflow
] Update AIR302
to check for deprecated context keys
|
||
// Check if the value is a context argument | ||
let is_context_arg = if let Expr::Name(ExprName { id, .. }) = &**value { | ||
id.as_str() == "context" || id.as_str().starts_with("**") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this variable name be anything other than context
?
|
||
let is_named_context = if let Expr::Name(name) = &**value { | ||
if let Some(parameter) = find_parameter(checker.semantic(), name) { | ||
matches!(parameter.name().as_str(), "context" | "kwargs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, can this parameter be anything other than context
or kwargs
?
check_removed_context_keys_usage(checker, call_expr); | ||
check_removed_context_keys_get_anywhere(checker, call_expr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really make sense as both function is going to perform the same check except that the first one is behind an additional check of @task
function. I'm not sure why this is present.
let is_named_context = if let Expr::Name(name) = &**value { | ||
if let Some(parameter) = find_parameter(checker.semantic(), name) { | ||
matches!(parameter.name().as_str(), "context" | "kwargs") | ||
|| parameter.name().as_str().starts_with("**") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter name won't really contain **
at the start. You can see that in the playground: https://play.ruff.rs/f7051131-c6c0-49a4-8b91-e74c37ae8088
Sorry for prematurely merging this PR. I see a lot of inconsistencies from which I've pointed out some of them above. I'm going to revert some of the changes and only keep the ones to only perform the check in |
Summary
Airflow 3.0 removes a set of deprecated context variables that were phased out in 2.x. This PR introduces lint rules to detect usage of these removed variables in various patterns, helping identify incompatibilities. The removed context variables include:
Detected Patterns and Examples
The linter now flags the use of removed context variables in the following scenarios:
Direct Subscript Access
.get("key")
Method CallsVariables Assigned from
get_current_context()
If a variable is assigned from
get_current_context()
and then used to access a removed key:Function Parameters in
@task
-Decorated FunctionsParameters named after removed context variables in functions decorated with
@task
are flagged:Removed Keys in Task Decorator
kwargs
and Other ScenariosOther similar patterns where removed context variables appear (e.g., as part of
kwargs
in a@task
function) are also detected.Test Plan
Test fixtures covering various patterns of deprecated context usage are included in this PR. For example:
Ruff will emit
AIR302
diagnostics for each deprecated usage, with suggestions when applicable, aiding in code migration to Airflow 3.0.related: apache/airflow#44409, apache/airflow#41641