Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve callback documentation #1468

Open
tatiana opened this issue Jan 15, 2025 · 0 comments
Open

Improve callback documentation #1468

tatiana opened this issue Jan 15, 2025 · 0 comments
Labels
area:docs Relating to documentation, changes, fixes, improvement

Comments

@tatiana
Copy link
Collaborator

tatiana commented Jan 15, 2025

We can improve this documentation further:
https://astronomer.github.io/astronomer-cosmos/configuration/callbacks.html

Example of content we recently shared during support:

dbt commands create most artefacts in the project's target folder, including run_results.json and manifest.json. Since Cosmos creates temporary folders for each dbt command being run, that folder vanishes by the end of the Cosmos task execution, as do the artefacts generated by the command.

However, Cosmos allows users to define custom functions that can run as part of the task execution before deleting the target's folder. They are called callbacks . In addition to describing their own custom callback methods, users can also leverage recently introduced standard functions in Cosmos that allow them to easily upload all the content of the target folder to an object storage (S3, GCS, Blob Storage). We do this by creating paths in the GCS bucket

There are two ways end-users can leverage using Cosmos auxiliary callback functions:

  1. If they are using DbtDag or DbtTaskGroup :

    cosmos_callback_dag = DbtDag(
    # dbt/cosmos-specific parameters
    project_config=ProjectConfig(
    DBT_ROOT_PATH / "jaffle_shop",
    ),
    profile_config=profile_config,
    operator_args={
    "install_deps": True, # install any necessary dependencies before running any dbt command
    "full_refresh": True, # used only in dbt commands that support this flag
    # --------------------------------------------------------------
    # Callback function to upload files using Airflow Object storage and Cosmos remote_target_path setting on Airflow 2.8 and above
    "callback": upload_to_cloud_storage,
    # --------------------------------------------------------------
    # Callback function to upload files to AWS S3, works for Airflow < 2.8 too
    # "callback": upload_to_aws_s3,
    # "callback_args": {"aws_conn_id": "aws_s3_conn", "bucket_name": "cosmos-artifacts-upload"},
    # --------------------------------------------------------------
    # Callback function to upload files to GCP GS, works for Airflow < 2.8 too
    # "callback": upload_to_gcp_gs,
    # "callback_args": {"gcp_conn_id": "gcp_gs_conn", "bucket_name": "cosmos-artifacts-upload"},
    # --------------------------------------------------------------
    # Callback function to upload files to Azure WASB, works for Airflow < 2.8 too
    # "callback": upload_to_azure_wasb,
    # "callback_args": {"azure_conn_id": "azure_wasb_conn", "container_name": "cosmos-artifacts-upload"},
    # --------------------------------------------------------------
    },

  2. If they use a Cosmos operator (from one of these execution modes: ExecutionMode.LOCAL or ExecutionMode.VIRTUALENV) directly. This is an example using DbtSeedLocalOperator:

    with DAG("example_operators", start_date=datetime(2024, 1, 1), catchup=False) as dag:
    # [START single_operator_callback]
    seed_operator = DbtSeedLocalOperator(
    profile_config=profile_config,
    project_dir=DBT_PROJ_DIR,
    task_id="seed",
    dbt_cmd_flags=["--select", "raw_customers"],
    install_deps=True,
    append_env=True,
    # --------------------------------------------------------------
    # Callback function to upload artifacts to AWS S3
    callback=upload_to_aws_s3,
    callback_args={"aws_conn_id": "aws_s3_conn", "bucket_name": "cosmos-artifacts-upload"},
    # --------------------------------------------------------------
    # Callback function to upload artifacts to GCP GS
    # callback=upload_to_gcp_gs,
    # callback_args={"gcp_conn_id": "gcp_gs_conn", "bucket_name": "cosmos-artifacts-upload"},
    # --------------------------------------------------------------
    # Callback function to upload artifacts to Azure WASB
    # callback=upload_to_azure_wasb,
    # callback_args={"azure_conn_id": "azure_wasb_conn", "container_name": "cosmos-artifacts-upload"},
    # --------------------------------------------------------------
    )
    # [END single_operator_callback]

I'm attaching a screenshot of the GCS bucket's content, for example (2), but it is similar for (1).

You can see the automatic nesting is:
a) Bucket pre-defined by the user
b) Name of the DAG
c) DAG Run identifier
d) Task ID
e) Task retry identifier
f) Target folder with its contents

If users are not happy with this structure or format, they can always implement their own methods, which can be based (or not) in the Cosmos standard ones:
https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/io.py

Some details that are missing from our documentation:

  • context about why the target content vanishes, that was the original motivation for callbacks
  • Rename "Example: Using Callbacks with remote_target_path (Airflow 2.8+)" to Example: Using DbtDag (the current information is an implementation detail)
  • Illustrate in both examples what data gets uploaded to GCS, with either an ASCII tree or screenshots
  • explain, in text, what is the default standard hierarchy
  • give an example of a custom callback
  • link cosmos/io.py file on Github and mention that users could create their own custom functions based on this, if needed
@dosubot dosubot bot added the area:docs Relating to documentation, changes, fixes, improvement label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:docs Relating to documentation, changes, fixes, improvement
Projects
None yet
Development

No branches or pull requests

1 participant