Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[airflow] Update AIR302 to check for deprecated context keys #15144

Merged

Conversation

sunank200
Copy link
Contributor

@sunank200 sunank200 commented Dec 26, 2024

Summary

Airflow 3.0 removes a set of deprecated context variables that were phased out in 2.x. This PR introduces lint rules to detect usage of these removed variables in various patterns, helping identify incompatibilities. The removed context variables include:

conf
execution_date
next_ds
next_ds_nodash
next_execution_date
prev_ds
prev_ds_nodash
prev_execution_date
prev_execution_date_success
tomorrow_ds
yesterday_ds
yesterday_ds_nodash

Detected Patterns and Examples

The linter now flags the use of removed context variables in the following scenarios:

  1. Direct Subscript Access

    execution_date = context["execution_date"]  # Flagged
  2. .get("key") Method Calls

    print(context.get("execution_date"))  # Flagged
  3. Variables Assigned from get_current_context()
    If a variable is assigned from get_current_context() and then used to access a removed key:

    c = get_current_context()
    print(c.get("execution_date"))  # Flagged
  4. Function Parameters in @task-Decorated Functions
    Parameters named after removed context variables in functions decorated with @task are flagged:

    from airflow.decorators import task
    
    @task
    def my_task(execution_date, **kwargs):  # Parameter 'execution_date' flagged
        pass
  5. Removed Keys in Task Decorator kwargs and Other Scenarios
    Other similar patterns where removed context variables appear (e.g., as part of kwargs in a @task function) are also detected.

from airflow.decorators import task

@task
def process_with_execution_date(**context):
    execution_date = lambda: context["execution_date"]  # flagged
    print(execution_date)

@task(kwargs={"execution_date": "2021-01-01"})   # flagged
def task_with_kwargs(**context):  
    pass

Test Plan

Test fixtures covering various patterns of deprecated context usage are included in this PR. For example:

from airflow.decorators import task, dag, get_current_context
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
import pendulum
from datetime import datetime

@task
def access_invalid_key_task(**context):
    print(context.get("conf"))  # 'conf' flagged

@task
def print_config(**context):
    execution_date = context["execution_date"]  # Flagged
    prev_ds = context["prev_ds"]                # Flagged

@task
def from_current_context():
    context = get_current_context()
    print(context["execution_date"])            # Flagged

# Usage outside of a task decorated function
c = get_current_context()
print(c.get("execution_date"))                 # Flagged

@task
def some_task(execution_date, **kwargs):
    print("execution date", execution_date)     # Parameter flagged

@dag(
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC")
)
def my_dag():
    task1 = DummyOperator(
        task_id="task1",
        params={
            "execution_date": "{{ execution_date }}",  # Flagged in template context
        },
    )

    access_invalid_key_task()
    print_config()
    from_current_context()
    
dag = my_dag()

class CustomOperator(BaseOperator):
    def execute(self, context):
        execution_date = context.get("execution_date")                      # Flagged
        next_ds = context.get("next_ds")                                               # Flagged
        next_execution_date = context["next_execution_date"]          # Flagged

Ruff will emit AIR302 diagnostics for each deprecated usage, with suggestions when applicable, aiding in code migration to Airflow 3.0.

related: apache/airflow#44409, apache/airflow#41641

Copy link
Contributor

github-actions bot commented Dec 26, 2024

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+2 -2 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

apache/superset (+2 -2 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --no-preview --select ALL

+ tests/integration_tests/charts/schema_tests.py:59:9: ANN201 Missing return type annotation for public function `test_query_context_series_limit`
- tests/integration_tests/charts/schema_tests.py:59:9: ANN201 Missing return type annotation for public function `test_query_context_series_limit`
+ tests/integration_tests/viz_tests.py:739:9: ANN201 Missing return type annotation for public function `test_filter_nulls`
- tests/integration_tests/viz_tests.py:739:9: ANN201 Missing return type annotation for public function `test_filter_nulls`

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
ANN201 4 2 2 0 0

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+4 -3 violations, +0 -0 fixes in 2 projects; 53 projects unchanged)

apache/airflow (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

+ providers/tests/system/papermill/example_papermill_remote_verify.py:44:37: AIR302 `execution_date` is removed in Airflow 3.0

apache/superset (+3 -3 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

- tests/integration_tests/charts/schema_tests.py:59:9: PLR6301 Method `test_query_context_series_limit` could be a function, class method, or static method
+ tests/integration_tests/charts/schema_tests.py:59:9: PLR6301 Method `test_query_context_series_limit` could be a function, class method, or static method
- tests/integration_tests/model_tests.py:202:9: PLR6301 Method `test_adjust_engine_params_mysql` could be a function, class method, or static method
+ tests/integration_tests/model_tests.py:202:9: PLR6301 Method `test_adjust_engine_params_mysql` could be a function, class method, or static method
+ tests/integration_tests/viz_tests.py:899:9: ANN201 Missing return type annotation for public function `test_apply_rolling_without_data`
- tests/integration_tests/viz_tests.py:899:9: ANN201 Missing return type annotation for public function `test_apply_rolling_without_data`

Changes by rule (3 rules affected)

code total + violation - violation + fix - fix
PLR6301 4 2 2 0 0
ANN201 2 1 1 0 0
AIR302 1 1 0 0 0

@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from 5c96f89 to 5103ef7 Compare December 27, 2024 04:52
@sunank200 sunank200 requested review from Lee-W and uranusjr December 27, 2024 04:53
@dhruvmanila dhruvmanila added rule Implementing or modifying a lint rule preview Related to preview mode features labels Dec 30, 2024
@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch 3 times, most recently from d580a4b to c0a34d3 Compare January 2, 2025 08:03
@dhruvmanila
Copy link
Member

For pattern 4, we should use a separate entrypoint which directly checks the function definition instead of trying to find the function definition from the expression otherwise we'll end up with duplicate diagnostics like in the following example:

from airflow.decorators import task


@task
def some_task(execution_date, conf, **kwargs):
    kwargs.get("execution_date")
    kwargs.get("conf")
Output of running Ruff on this PR:

/Users/dhruv/playground/ruff/src/AIR302.py:5:15: AIR302 `execution_date` is removed in Airflow 3.0
  |
4 | @task
5 | def some_task(execution_date, conf, **kwargs):
  |               ^^^^^^^^^^^^^^ AIR302
6 |     kwargs.get("execution_date")
7 |     kwargs.get("conf")
  |

/Users/dhruv/playground/ruff/src/AIR302.py:5:15: AIR302 `execution_date` is removed in Airflow 3.0
  |
4 | @task
5 | def some_task(execution_date, conf, **kwargs):
  |               ^^^^^^^^^^^^^^ AIR302
6 |     kwargs.get("execution_date")
7 |     kwargs.get("conf")
  |

/Users/dhruv/playground/ruff/src/AIR302.py:5:31: AIR302 `conf` is removed in Airflow 3.0
  |
4 | @task
5 | def some_task(execution_date, conf, **kwargs):
  |                               ^^^^ AIR302
6 |     kwargs.get("execution_date")
7 |     kwargs.get("conf")
  |

/Users/dhruv/playground/ruff/src/AIR302.py:5:31: AIR302 `conf` is removed in Airflow 3.0
  |
4 | @task
5 | def some_task(execution_date, conf, **kwargs):
  |                               ^^^^ AIR302
6 |     kwargs.get("execution_date")
7 |     kwargs.get("conf")
  |

/Users/dhruv/playground/ruff/src/AIR302.py:6:16: AIR302 `execution_date` is removed in Airflow 3.0
  |
4 | @task
5 | def some_task(execution_date, conf, **kwargs):
6 |     kwargs.get("execution_date")
  |                ^^^^^^^^^^^^^^^^ AIR302
7 |     kwargs.get("conf")
  |

/Users/dhruv/playground/ruff/src/AIR302.py:7:16: AIR302 `conf` is removed in Airflow 3.0
  |
5 | def some_task(execution_date, conf, **kwargs):
6 |     kwargs.get("execution_date")
7 |     kwargs.get("conf")
  |                ^^^^^^ AIR302
  |

Found 6 errors.

Can you make this change? I tried pushing it but I'm unable to do so, do you have "allow edits to maintainers off" (https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork) ? Here's a patch for that:

commit c76e15a45d61a42d95a4fbaf23aceae9134652db
Author: Dhruv Manilawala <dhruvmanila@gmail.com>
Date:   Thu Jan 23 14:13:29 2025 +0530

    Check context parameters directly from function definition

diff --git a/crates/ruff_linter/src/checkers/ast/analyze/statement.rs b/crates/ruff_linter/src/checkers/ast/analyze/statement.rs
index d5f92b9f2..7f64be4bd 100644
--- a/crates/ruff_linter/src/checkers/ast/analyze/statement.rs
+++ b/crates/ruff_linter/src/checkers/ast/analyze/statement.rs
@@ -376,6 +376,9 @@ pub(crate) fn statement(stmt: &Stmt, checker: &mut Checker) {
             if checker.enabled(Rule::PytestParameterWithDefaultArgument) {
                 flake8_pytest_style::rules::parameter_with_default_argument(checker, function_def);
             }
+            if checker.enabled(Rule::Airflow3Removal) {
+                airflow::rules::removed_in_3_function_def(checker, function_def);
+            }
         }
         Stmt::Return(_) => {
             if checker.enabled(Rule::ReturnOutsideFunction) {
diff --git a/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs b/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs
index b08f8986b..f36dd29db 100644
--- a/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs
+++ b/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs
@@ -168,6 +168,36 @@ pub(crate) fn removed_in_3(checker: &mut Checker, expr: &Expr) {
     }
 }
 
+/// AIR302
+pub(crate) fn removed_in_3_function_def(checker: &mut Checker, function_def: &StmtFunctionDef) {
+    if !checker.semantic().seen_module(Modules::AIRFLOW) {
+        return;
+    }
+
+    if !is_airflow_task(function_def, checker.semantic()) {
+        return;
+    }
+
+    for param in function_def
+        .parameters
+        .posonlyargs
+        .iter()
+        .chain(function_def.parameters.args.iter())
+        .chain(function_def.parameters.kwonlyargs.iter())
+    {
+        let param_name = param.parameter.name.as_str();
+        if REMOVED_CONTEXT_KEYS.contains(&param_name) {
+            checker.diagnostics.push(Diagnostic::new(
+                Airflow3Removal {
+                    deprecated: param_name.to_string(),
+                    replacement: Replacement::None,
+                },
+                param.parameter.name.range(),
+            ));
+        }
+    }
+}
+
 #[derive(Debug, Eq, PartialEq)]
 enum Replacement {
     None,
@@ -398,37 +428,6 @@ fn check_context_get(checker: &mut Checker, call_expr: &ExprCall) {
     if attr.as_str() != "get" {
         return;
     }
-    let function_def = {
-        let mut parents = checker.semantic().current_statements();
-        parents.find_map(|stmt| {
-            if let Stmt::FunctionDef(func_def) = stmt {
-                Some(func_def.clone())
-            } else {
-                None
-            }
-        })
-    };
-
-    if let Some(func_def) = function_def {
-        for param in func_def
-            .parameters
-            .posonlyargs
-            .iter()
-            .chain(func_def.parameters.args.iter())
-            .chain(func_def.parameters.kwonlyargs.iter())
-        {
-            let param_name = param.parameter.name.as_str();
-            if REMOVED_CONTEXT_KEYS.contains(&param_name) {
-                checker.diagnostics.push(Diagnostic::new(
-                    Airflow3Removal {
-                        deprecated: param_name.to_string(),
-                        replacement: Replacement::None,
-                    },
-                    param.parameter.name.range(),
-                ));
-            }
-        }
-    }
 
     for removed_key in REMOVED_CONTEXT_KEYS {
         if let Some(argument) = call_expr.arguments.find_argument_value(removed_key, 0) {
@@ -1087,3 +1086,14 @@ fn is_airflow_builtin_or_provider(segments: &[&str], module: &str, symbol_suffix
         _ => false,
     }
 }
+
+/// Returns `true` if the given function is decorated with `@airflow.decorators.task`.
+fn is_airflow_task(function_def: &StmtFunctionDef, semantic: &SemanticModel) -> bool {
+    function_def.decorator_list.iter().any(|decorator| {
+        semantic
+            .resolve_qualified_name(map_callable(&decorator.expression))
+            .is_some_and(|qualified_name| {
+                matches!(qualified_name.segments(), ["airflow", "decorators", "task"])
+            })
+    })
+}

@sunank200
Copy link
Contributor Author

use a separate entrypoint which directly checks the function definition

@dhruvmanila changed it.

@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from e844568 to c2e6cb2 Compare January 23, 2025 09:03
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one last doubt but otherwise this looks good.

For a @task decorated functions, we check for both var.get(...) and var[...] pattern but there are other places where we only check subscript expressions. Is that expected? Should those places also check for var.get(...) pattern? If yes, are there additional requirements in those places similar to how there's a requirement for a @task decorator?

Comment on lines +97 to +100
class CustomOperator(BaseOperator):
def execute(self, context):
execution_date = context["execution_date"]
next_ds = context["next_ds"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for context.get pattern in here as well? I think that's not being checked as per my local testing. For example, the following will not raise any diagnostics:

from airflow.models.baseoperator import BaseOperator


class CustomOperator(BaseOperator):
    def execute(self, context):
        execution_date = context.get("execution_date")
        next_ds = context.get("next_ds")

Now, as per:

2. The execute function of a BaseOperator subclass (...)

It seems that it's not any method but rather a specific method and only when it's inherited by BaseOperator. I don't think we're checking this. Is that ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we'll need to check this
  2. any function that is called in BaseOperator.execute. It's possible to have false positive now but the chance are low I think

def any_func(context):
    context.get("execution_date")

class CustomOperator(BaseOperator):
    def execute(self, context):
        any_func(context)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Added the change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. any function that is called in BaseOperator.execute. It's possible to have false positive now but the chance are low I think

Yeah, I don't think we're checking this. I'm fine merging this as is for now but one solution might be to collect all the functions that occur in this context and then defer checking of them. But, this will make it a bit complex and might not be worth it.

@sunank200
Copy link
Contributor Author

I have one last doubt but otherwise this looks good.

For a @task decorated functions, we check for both var.get(...) and var[...] pattern but there are other places where we only check subscript expressions. Is that expected? Should those places also check for var.get(...) pattern? If yes, are there additional requirements in those places similar to how there's a requirement for a @task decorator?

@dhruvmanila I have added a check for both var.get(...) and var[...] pattern at all places.

Comment on lines +65 to +73
task1 = DummyOperator(
task_id="task1",
params={
# Removed variables in template
"execution_date": "{{ execution_date }}",
"next_ds": "{{ next_ds }}",
"prev_ds": "{{ prev_ds }}"
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this in the snapshot. Should we remove this test case?

(I'll do it in a follow-up PR.)

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Welcome to Ruff! 🥳

@dhruvmanila dhruvmanila changed the title [airflow] Add lint rule to show error for removed context variables in airflow [airflow] Update AIR302 to check for deprecated context keys Jan 24, 2025
@dhruvmanila dhruvmanila merged commit 34cc3ca into astral-sh:main Jan 24, 2025
21 checks passed

// Check if the value is a context argument
let is_context_arg = if let Expr::Name(ExprName { id, .. }) = &**value {
id.as_str() == "context" || id.as_str().starts_with("**")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this variable name be anything other than context?


let is_named_context = if let Expr::Name(name) = &**value {
if let Some(parameter) = find_parameter(checker.semantic(), name) {
matches!(parameter.name().as_str(), "context" | "kwargs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, can this parameter be anything other than context or kwargs?

Comment on lines +196 to +197
check_removed_context_keys_usage(checker, call_expr);
check_removed_context_keys_get_anywhere(checker, call_expr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really make sense as both function is going to perform the same check except that the first one is behind an additional check of @task function. I'm not sure why this is present.

let is_named_context = if let Expr::Name(name) = &**value {
if let Some(parameter) = find_parameter(checker.semantic(), name) {
matches!(parameter.name().as_str(), "context" | "kwargs")
|| parameter.name().as_str().starts_with("**")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name won't really contain ** at the start. You can see that in the playground: https://play.ruff.rs/f7051131-c6c0-49a4-8b91-e74c37ae8088

@dhruvmanila
Copy link
Member

Sorry for prematurely merging this PR. I see a lot of inconsistencies from which I've pointed out some of them above. I'm going to revert some of the changes and only keep the ones to only perform the check in @task-decorated functions. Please create a follow-up PR to perform this check in other contexts like the Operator.execute method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants