-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚚 Migrate Airflow workloads to APC #4490
Comments
Blocked while Airflow component is being worked on |
Comms sent to ask-data-engineering with sheet to fill in https://docs.google.com/spreadsheets/d/1B8DOsSgnxGV1FjRv8dLv0wqDMo2RiiMqedFogLBpQEQ |
Moving back to blocked while IRSA is being worked on |
I've cut a new release of the cross-account-ecr action, published a new version of template-airflow-python which used the new v1 action and correctly adds APC accounts to repo policy. I then updated the example dag to use the new image version and APC dev context (https://github.com/moj-analytical-services/airflow/pull/3613) and below is the output when running it (even though it fails because it can't use IRSA yet, it still pulls) vscode ➜ /workspaces/modernisation-platform-environments (main) [ aws: analytical-platform-compute-development:modernisation-platform-sandbox@eu-west-2 ] [ context: arn:aws:eks:eu-west-2:381491960855:cluster/analytical-platform-compute-development ] $ kubectl --namespace airflow get events
LAST SEEN TYPE REASON OBJECT MESSAGE
59s Normal Scheduled pod/task-1-cecda48866f94f90a3357d96206822b6 Successfully assigned airflow/task-1-cecda48866f94f90a3357d96206822b6 to ip-10-200-33-237.eu-west-2.compute.internal
58s Normal Pulling pod/task-1-cecda48866f94f90a3357d96206822b6 Pulling image "189157455002.dkr.ecr.eu-west-1.amazonaws.com/template-airflow-python:v0.4"
53s Normal Pulled pod/task-1-cecda48866f94f90a3357d96206822b6 Successfully pulled image "189157455002.dkr.ecr.eu-west-1.amazonaws.com/template-airflow-python:v0.4" in 5.264s (5.264s including waiting). Image size: 76701464 bytes. |
APC OIDC added to APDP |
We've tested @AntFMoJ's toy DAG on APC with IRSA cross account and its working 🎉 Unfortunately we are now blocked in discussion with Modernisation Platform about reuse of network ranges. |
Updates:
|
Moving to blocked while we figure out how to proceed with Direct Connect. |
Meeting with HMCTS' network architect on 11/07/24 @ 11:30 BST |
Escalated to HMCTS head of DTS people and profession on 24th July 2024. Our ask has now been raised with the lead PlatOps in HMCTS. Currently awaiting on a response. If no movement by the end of the week will escalate to Martyn. |
Meeting help with DTS PlatOps 5/8/24 and has been escalated. Waiting for meeting to be be arranged with HMCTS stakeholders. |
Meeting with HMCTS arranged for 5/9/24 to discuss scope of work |
Had meeting with HMCTS, they are going to put is in touch with CloudGateway |
Sent chaser email on 15/10 and 22/10 |
meeting arranged with cloudgateway for 4/11 |
VPN endpoint data sent cloud gateway, awaiting response |
Updated VPN configuration parameters and sent over. Apparently we are waiting on commercials too. |
Pencilled some time in on Thursday 21/11 to bridge with CGW |
nonprod was cutover on 21/11 🎉 prod is being arranged for 27/11 |
Now 2/12, maintenance posted https://status.analytical-platform.service.justice.gov.uk/posts/details/PK6KZ5V |
A rough cut of dev:
|
A rough cut of prod:
|
Internal networking has been cutover to APC |
Scope of this ticket is quite big - Suggest we close this one and create smaller tickets for remaining tasks? |
User Story
As an Analytical Platform engineer
I want (current) Airflow jobs to schedule on APC
So that we can fully retire the Airflow EKS clusters
Value / Purpose
Airflow EKS clusters are partially managed in Terraform, pinned to IMDSv1, use kube2iam, and have no observability 😭
Migrating these workloads to APC will allow us to retire more clusters and make use of the newer capabilities in EKS and the supported tooling.
Useful Contacts
@jacobwoffenden
User Types
Platform Engineering
Hypothesis
If we... [do a thing]
Then... [this will happen]
Proposal
Migrate Airflow workloads to APC
Additional Information
This was sort of started in DPAT #2843 but never happened
Blocked by:
Definition of Done
The text was updated successfully, but these errors were encountered: