-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for toggle to force delete azurerm_kubernetes_cluster_node_pool #10411
Comments
hi @t3mi Thanks for opening this issue. Unfortunately at this time the Azure API doesn't support a means of draining a Node Pool - as such at this point in time the Provider attempts to destroy the Node Pool using the Delete API and waits for it to complete. Whilst it's possible that the AKS API may provide the option to configure this in the future, unfortunately it doesn't at this point in time - and since there's no API to support this behaviour at this time, this'd need the AKS Service Team to expose a means of controlling this. As such I'm going to suggest opening an issue on the AKS Repository instead where someone from that team should be able to take a look - and once it's available we can look at integrating this into Terraform. Thanks! |
This comment has been minimized.
This comment has been minimized.
Would it be possible to issue a delete on the AKS instance first (before the pools) and then have the pools refresh their TF status before issuing their delete (if they still exist)? My struggle with this error is with completely tearing down an environment - I get the same error. My end-goal is for all of the assets to go away. In the UI, I can just delete AKS and it deletes the custom node pools for me. It seems like Terraform is issuing the delete command on my custom node pools before issuing the delete command to the AKS instance. |
I can confirm as I am encountering the same issue when destroying a module with:
It tries to destroy the node pools and the cluster simultaneously for a few minutes before eventually returning the same error. |
However, it turns out that after re-running |
As a work around for others who find this, we added a several kubectl commands to delete all non system/AKS namespaces in the pipeline prior to executing the Terraform step to destroy the environment. It slows the process, but ensures a successful TF destroy. |
resource "azurerm_kubernetes_cluster_node_pool" "infra" {
[...]
node_labels = {
"example.com/cluster-name" = azurerm_kubernetes_cluster.main.name,
"example.com/resource-group" = azurerm_kubernetes_cluster.main.resource_group_name,
}
provisioner "local-exec" {
when = destroy
command = <<-EOF
az aks command invoke -g "${self.node_labels["example.com/resource-group"]}" -n "${self.node_labels["example.com/cluster-name"]}" -c "kubectl drain -l 'kubernetes.azure.com/agentpool=${self.name}' --ignore-daemonsets --delete-emptydir-data --timeout=10m || kubectl drain -l 'kubernetes.azure.com/agentpool=${self.name}' --ignore-daemonsets --delete-emptydir-data --timeout=10m --disable-eviction=true"
EOF
interpreter = ["/usr/bin/env", "bash", "-c"]
} Here's a workaround that uses the AKS resource "azurerm_kubernetes_cluster" "main" {
[...]
run_command_enabled = true
provisioner "local-exec" {
command = <<-EOF
az aks command invoke -g "${self.resource_group_name}" -n "${self.name}" -c "kubectl create ns aks-command && kubectl annotate ns/aks-command scheduler.alpha.kubernetes.io/node-selector=kubernetes.azure.com/agentpool=${self.default_node_pool[0].name}"
EOF
interpreter = ["/usr/bin/env", "bash", "-c"]
}
} This is clearly not ideal but will do the trick until AKS supports it natively (Azure/AKS#2090). |
Thanks for taking the time to open this issue. It looks like the behavior you requested is not supported by the underlying Azure API so I am going to label this issue as such and close it for now. When it gets added, we can reopen this request or you can create a new one. |
I'm a bit confused here:
When I turn on debug logging in terraform, I see this happening on a delete of a nodepool:
So that api seems to indicate there is a parameter for this? |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Community Note
Description
In case of small sized node pool or tolerations on it which prevents re-scheduling of the pods configured with pod disruption budget following error occurs during node pool removal:
New or Affected Resource(s)
Potential Terraform Configuration
References
The text was updated successfully, but these errors were encountered: