Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔱 Create a DAG that can deploy tables from Create-A-Derived-Table Successfully #5857

Closed
10 tasks done
jhpyke opened this issue Oct 28, 2024 · 9 comments
Closed
10 tasks done
Assignees
Labels
📊 CaDeT spike investigation, discovery into a thing

Comments

@jhpyke
Copy link
Contributor

jhpyke commented Oct 28, 2024

Context

We want to explore whether Github Actions continues to be the best scheduling option for CaDeT as a product. Currently, we use it because it was a quick and easy way to get secure dedicated compute for running our deployments, which require the role to be relatively highly empowered. However, we're essentially just using a kubernetes pod running relatively generic linux and some public python packages to build our tables. As such, this could be theoretically be converted into a DAG pipeline, with scheduling handed off to airflow instead. As such, this spike looks to create an Image and DAG that would be able to deploy a CaDeT pipeline to prove out the concept.

Proposal

The first task will be in designing an image that does all the steps our deployment workflow currently does. Outside the github actions ecosystem, this may require scripting to achieve tasks that are currently achieved by handing off to other github actions directly. We should start with our testing domain, and prove out that we can deploy the testing domain from airflow. This will also minimise the possibility of disruption to customer work.

  • Use our template repo to create a new image pipeline, per the airflow user guidance
  • Name that image repo CADET-Airflow-Spike
  • Add in the required scripting to replicate an existing CaDeT deployment - We should look to replicate the sandpit testing pipeline for deployment initially, as this will exist in a domain and target that are inaccessible to users.
  • Test this locally, by running your python code to deploy the sandpit pipeline.
  • Once this is validated, publish the image, and create a DAG in the Airflow Repo that can consume it
  • Ensure the Role associated with this pipeline has permissions that are appropriately scoped for the operations it will need to carry out, and no higher.

Spike requirements

Data Engineer - 1/2-1 sprint

Definition of Done

  • An image has been created that can deploy the sandpit_testing CaDeT domain.
  • That image can be consumed in a DAG
  • The role has sufficiently empowered permissions to do all actions
  • A test deployment is done
@jhpyke jhpyke added the spike investigation, discovery into a thing label Oct 28, 2024
@jnayak-moj jnayak-moj self-assigned this Nov 18, 2024
@jnayak-moj
Copy link

jnayak-moj commented Dec 6, 2024

The work is in progress.

Development

A. create a new image pipeline

B. Replicate CaDeT deployment

  • Required scripting to replicate an existing CaDeT deployment is done
  • replicating the sandpit testing pipeline for deployment is successfully tested locally, this sone by writing few bash scripts

C. Add a new DAG in Airflow repo

Testing

Testing is under progress.

Currently the DAG is failing with [Errno 13] Permission denied: '/.aws'
The issue seems related to AWS CLI v2 installation in docker image.

The Airflow log is here.
https://23f37892-d1d1-4d9f-a03d-b8a53581fd20.c0.eu-west-1.airflow.amazonaws.com/log?execution_date=2024-12-06T13%3A29%3A00.550033%2B00%3A00&task_id=cadet-sandpit-pipeline-spike&dag_id=development-sandpit.deploy_sandpit&map_index=-1

@jnayak-moj
Copy link

jnayak-moj commented Dec 6, 2024

just received a slack message from Francesca Von Braun-Bates about this bug raised.
#6248

Need to follow the steps and reproduce and verify if AWS CLI installation works.

@jacobwoffenden
Copy link
Member

@jnayak-moj summary of that bug is here #6248 (comment), not an issue with AWS CLI installation, rather lack of permissions to access paramater

@jacobwoffenden
Copy link
Member

I have run a pod using the new Airflow Python base and attaching the service account created by airflow repo and can see the following

analyticalplatform@cadet-pipeline-spike:/opt/analytical-platform$ aws sts get-caller-identity
{
    "UserId": "AROAYUIXP4BW73VMXZLBD:botocore-session-1733506906",
    "Account": "593291632749",
    "Arn": "arn:aws:sts::593291632749:assumed-role/airflow_dev_cadet_pipeline_spike/botocore-session-1733506906"
}
analyticalplatform@cadet-pipeline-spike:/opt/analytical-platform$ aws secretsmanager get-secret-value \
    --secret-id "create_a_derived_table/dev/github_app_key" \
    --region "eu-west-1" \
    --query SecretString \
    --output text
-----BEGIN RSA PRIVATE KEY-----
...

This pull request (https://github.com/moj-analytical-services/CADET-Airflow-Spike/pull/36) updates the Dockerfile to the new image which includes the AWS CLI

@jacobwoffenden jacobwoffenden moved this from 👀 TODO to 🚀 In Progress in Analytical Platform Dec 7, 2024
@YvanMOJdigital YvanMOJdigital moved this from 🚀 In Progress to 👀 TODO in Analytical Platform Dec 9, 2024
@jnayak-moj
Copy link

I tested various scenarios and I can confirm with the new airflow python base image, aws cli is being installed correctly. But still the code is failing reading the secrets from the secretsmanager. I think there may be some permission issue attached to the deploy key.

I am doing further investigation.

@jnayak-moj
Copy link

Export run artefacts to S3 (../scripts/export_run_artefacts.py) from the deployment steps was failing due to GITHUB_OUTPUT environment variable was missing in the environment. After I commented this step out, The airflow job ran successfully. The log of the job is below. All the DBT steps completed successfully.

https://23f37892-d1d1-4d9f-a03d-b8a53581fd20.c0.eu-west-1.airflow.amazonaws.com/log?execution_date=2024-12-20T15%3A33%3A40.259147%2B00%3A00&task_id=cadet-sandpit-pipeline-spike&dag_id=development-sandpit.deploy_sandpit&map_index=-1

@jhpyke jhpyke moved this from 👀 TODO to 🚀 In Progress in Analytical Platform Jan 7, 2025
@jnayak-moj
Copy link

jnayak-moj commented Jan 7, 2025

So we can close this ticket here and carry on the expansion tasks in a new ticket.
There is a following ticket which adds some other functionalities like retry, trigger another DAG, posting status in slack channel etc.
The following ticket is here.
#6512

@jnayak-moj jnayak-moj moved this from 🚀 In Progress to 🛂 In Review in Analytical Platform Jan 7, 2025
@jnayak-moj jnayak-moj assigned jhpyke and unassigned jnayak-moj Jan 7, 2025
@julialawrence julialawrence moved this from 🛂 In Review to 🎉 Done in Analytical Platform Jan 8, 2025
@julialawrence julialawrence closed this as completed by moving to 🎉 Done in Analytical Platform Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📊 CaDeT spike investigation, discovery into a thing
Projects
Status: 🎉 Done
Development

No branches or pull requests

4 participants