Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[airflow] Update AIR302 to check for deprecated context keys #15144

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions crates/ruff_linter/resources/test/fixtures/airflow/AIR302_context.py
sunank200 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
from datetime import datetime

import pendulum

from airflow.decorators import dag, task
from airflow.models import DAG
from airflow.models.baseoperator import BaseOperator
from airflow.operators.dummy import DummyOperator
from airflow.plugins_manager import AirflowPlugin
from airflow.providers.standard.operators.python import PythonOperator
from airflow.utils.context import get_current_context


def access_invalid_key_in_context(**context):
print("access invalid key", context["conf"])

@task
def access_invalid_key_task_out_of_dag(**context):
print("access invalid key", context.get("conf"))

@dag(
schedule=None,
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
tags=[""],
)
def invalid_dag():
@task()
def access_invalid_key_task(**context):
print("access invalid key", context.get("conf"))

task1 = PythonOperator(
task_id="task1",
python_callable=access_invalid_key_in_context,
)
access_invalid_key_task() >> task1
access_invalid_key_task_out_of_dag()

invalid_dag()

@task
def print_config(**context):
sunank200 marked this conversation as resolved.
Show resolved Hide resolved
# This should not throw an error as logical_date is part of airflow context.
logical_date = context["logical_date"]
sunank200 marked this conversation as resolved.
Show resolved Hide resolved

sunank200 marked this conversation as resolved.
Show resolved Hide resolved
# Removed usage - should trigger violations
execution_date = context["execution_date"]
next_ds = context["next_ds"]
next_ds_nodash = context["next_ds_nodash"]
next_execution_date = context["next_execution_date"]
prev_ds = context["prev_ds"]
prev_ds_nodash = context["prev_ds_nodash"]
prev_execution_date = context["prev_execution_date"]
prev_execution_date_success = context["prev_execution_date_success"]
tomorrow_ds = context["tomorrow_ds"]
yesterday_ds = context["yesterday_ds"]
yesterday_ds_nodash = context["yesterday_ds_nodash"]

with DAG(
dag_id="example_dag",
schedule_interval="@daily",
start_date=datetime(2023, 1, 1),
template_searchpath=["/templates"],
) as dag:
task1 = DummyOperator(
task_id="task1",
params={
# Removed variables in template
"execution_date": "{{ execution_date }}",
"next_ds": "{{ next_ds }}",
"prev_ds": "{{ prev_ds }}"
},
)
Comment on lines +65 to +73
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this in the snapshot. Should we remove this test case?

(I'll do it in a follow-up PR.)


class CustomMacrosPlugin(AirflowPlugin):
name = "custom_macros"
macros = {
"execution_date_macro": lambda context: context["execution_date"],
"next_ds_macro": lambda context: context["next_ds"]
}

@task
def print_config():
context = get_current_context()
execution_date = context["execution_date"]
next_ds = context["next_ds"]
next_ds_nodash = context["next_ds_nodash"]
next_execution_date = context["next_execution_date"]
prev_ds = context["prev_ds"]
prev_ds_nodash = context["prev_ds_nodash"]
prev_execution_date = context["prev_execution_date"]
prev_execution_date_success = context["prev_execution_date_success"]
tomorrow_ds = context["tomorrow_ds"]
yesterday_ds = context["yesterday_ds"]
yesterday_ds_nodash = context["yesterday_ds_nodash"]

class CustomOperator(BaseOperator):
def execute(self, context):
execution_date = context["execution_date"]
next_ds = context["next_ds"]
Comment on lines +97 to +100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for context.get pattern in here as well? I think that's not being checked as per my local testing. For example, the following will not raise any diagnostics:

from airflow.models.baseoperator import BaseOperator


class CustomOperator(BaseOperator):
    def execute(self, context):
        execution_date = context.get("execution_date")
        next_ds = context.get("next_ds")

Now, as per:

2. The execute function of a BaseOperator subclass (...)

It seems that it's not any method but rather a specific method and only when it's inherited by BaseOperator. I don't think we're checking this. Is that ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we'll need to check this
  2. any function that is called in BaseOperator.execute. It's possible to have false positive now but the chance are low I think

def any_func(context):
    context.get("execution_date")

class CustomOperator(BaseOperator):
    def execute(self, context):
        any_func(context)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Added the change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. any function that is called in BaseOperator.execute. It's possible to have false positive now but the chance are low I think

Yeah, I don't think we're checking this. I'm fine merging this as is for now but one solution might be to collect all the functions that occur in this context and then defer checking of them. But, this will make it a bit complex and might not be worth it.

next_ds_nodash = context["next_ds_nodash"]
next_execution_date = context["next_execution_date"]
prev_ds = context["prev_ds"]
prev_ds_nodash = context["prev_ds_nodash"]
prev_execution_date = context["prev_execution_date"]
prev_execution_date_success = context["prev_execution_date_success"]
tomorrow_ds = context["tomorrow_ds"]
yesterday_ds = context["yesterday_ds"]
yesterday_ds_nodash = context["yesterday_ds_nodash"]

@task
def access_invalid_argument_task_out_of_dag(execution_date, tomorrow_ds, logical_date, **context):
print("execution date", execution_date)
print("access invalid key", context.get("conf"))

@task(task_id="print_the_context")
def print_context(ds=None, **kwargs):
"""Print the Airflow context and ds variable from the context."""
print(ds)
print(kwargs.get("tomorrow_ds"))
c = get_current_context()
c.get("execution_date")

class CustomOperatorNew(BaseOperator):
def execute(self, context):
execution_date = context.get("execution_date")
next_ds = context.get("next_ds")
4 changes: 3 additions & 1 deletion crates/ruff_linter/src/checkers/ast/analyze/expression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,9 @@ pub(crate) fn expression(expr: &Expr, checker: &mut Checker) {
if checker.enabled(Rule::NonPEP646Unpack) {
pyupgrade::rules::use_pep646_unpack(checker, subscript);
}

if checker.enabled(Rule::Airflow3Removal) {
airflow::rules::removed_in_3(checker, expr);
}
pandas_vet::rules::subscript(checker, value, expr);
}
Expr::Tuple(ast::ExprTuple {
Expand Down
3 changes: 3 additions & 0 deletions crates/ruff_linter/src/checkers/ast/analyze/statement.rs
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,9 @@ pub(crate) fn statement(stmt: &Stmt, checker: &mut Checker) {
if checker.enabled(Rule::PytestParameterWithDefaultArgument) {
flake8_pytest_style::rules::parameter_with_default_argument(checker, function_def);
}
if checker.enabled(Rule::Airflow3Removal) {
airflow::rules::removed_in_3_function_def(checker, function_def);
}
if checker.enabled(Rule::NonPEP695GenericFunction) {
pyupgrade::rules::non_pep695_generic_function(checker, function_def);
}
Expand Down
1 change: 1 addition & 0 deletions crates/ruff_linter/src/rules/airflow/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ mod tests {
#[test_case(Rule::Airflow3Removal, Path::new("AIR302_names.py"))]
#[test_case(Rule::Airflow3Removal, Path::new("AIR302_class_attribute.py"))]
#[test_case(Rule::Airflow3Removal, Path::new("AIR302_airflow_plugin.py"))]
#[test_case(Rule::Airflow3Removal, Path::new("AIR302_context.py"))]
#[test_case(Rule::Airflow3MovedToProvider, Path::new("AIR303.py"))]
fn rules(rule_code: Rule, path: &Path) -> Result<()> {
let snapshot = format!("{}_{}", rule_code.noqa_code(), path.to_string_lossy());
Expand Down
Loading
Loading