-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to get routing manager deployment status #58
Comments
I am also seeing this error on MacOS Monterey 12.6.5. Interestingly, when using @praveensrao I would suggest his issue actually belongs against https://github.com/Azure/vscode-bridge-to-kubernetes, as I can use the CLI successfully on MacOS. |
@praveensrao I created an issue against the extension repo ^^^ Give the CLI a look if its a blocker for you, works fine for me. |
@garyburgmann - thanks for your help. |
@praveensrao @garyburgmann Thanks for the details and the chat. Is this happening with latest release Also can you confirm in the following location, you are dotnet 7 libraries downloaded, if you still see dotnet 6 it is an issue. |
@praveensrao @garyburgmann Thanks for the logs and inputs. here are my observations from logs
@garyburgmann do you happen to see similar type of errors in your logs ? Please confirm. Thank you. |
$ kc -n bridge logs -f garyburgmanngo1com-iam-64b56ff9f8-jktwm
2023-08-03T06:08:32.5767620Z | RemoteAgent | TRACE | Shutting down gracefully\nOperation context: <json>{"clientRequestId":"0c293ee7-9883-4531-8e7a-e2271c22efd2","correlationId":"ab88dc60-b6ed-47d0-b38a-785440ea7a111691041696493:4f8cfff13346:3c9a2eb7ee94","requestId":null,"userSubscriptionId":null,"startTime":"2023-08-03T06:06:28.158888+00:00","userAgent":"RemoteAgent/1.0.20230524.1","requestHttpMethod":null,"requestUri":null,"version":"1.0.20230524.1","requestHeaders":{},"loggingProperties":{"ApplicationName":"RemoteAgent","DeviceOperatingSystem":"Linux 5.15.0-1040-azure #47-Ubuntu SMP Thu Jun 1 19:38:24 UTC 2023","Framework":".NET 6.0.16","ProcessId":1,"TargetEnvironment":"Production"}}</json>
2023-08-03T06:08:32.5961584Z | RemoteAgent | TRACE | AssemblyLoadContext unloading
2023-08-03T06:08:32.6043901Z | RemoteAgent | TRACE | Process exiting...
2023-08-03T06:08:32.6047679Z | RemoteAgent | TRACE | Process exited they were the only logs on the pod before it died $ kc -n bridge logs -f garyburgmanngo1com-iam-64b56ff9f8-jktwm-restore-d54b0-7txhs
2023-08-03T06:06:33.0405947Z | RestorationJob | TRACE | Waiting to restore previous state on Pod bridge/garyburgmanngo1com-iam-64b56ff9f8-jktwm...\nOperation context: <json>{"clientRequestId":"512a39bb-2c19-498b-bcf9-aacbea2478d1","correlationId":"ab88dc60-b6ed-47d0-b38a-785440ea7a111691041696493:4f8cfff13346:3c9a2eb7ee94","requestId":null,"userSubscriptionId":null,"startTime":"2023-08-03T06:06:32.8105723+00:00","userAgent":"RestorationJob/1.0.20230705.1","requestHttpMethod":null,"requestUri":null,"version":"1.0.20230705.1","requestHeaders":{},"loggingProperties":{"ApplicationName":"RestorationJob","DeviceOperatingSystem":"Linux 5.15.0-1040-azure #47-Ubuntu SMP Thu Jun 1 19:38:24 UTC 2023","Framework":".NET 7.0.8","ProcessId":1,"TargetEnvironment":"Production"}}</json>
2023-08-03T06:06:43.3806986Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:06:43.4223754Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":355}}</json>
2023-08-03T06:06:48.4334184Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:06:48.4381234Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":13}}</json>
2023-08-03T06:06:53.4510364Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:06:53.4535317Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":10}}</json>
2023-08-03T06:06:58.4600346Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:06:58.4627438Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":11}}</json>
2023-08-03T06:07:03.4773051Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:03.4793142Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":16}}</json>
2023-08-03T06:07:08.4868602Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:08.4889216Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":9}}</json>
2023-08-03T06:07:13.4955392Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:13.4975159Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":10}}</json>
2023-08-03T06:07:18.5046539Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:18.5066087Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":10}}</json>
2023-08-03T06:07:23.5126653Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:23.5145872Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":9}}</json>
2023-08-03T06:07:28.5356300Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:28.5375957Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":22}}</json>
2023-08-03T06:07:33.5446697Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:33.5465370Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":10}}</json>
2023-08-03T06:07:38.5541115Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:38.5561159Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":9}}</json>
2023-08-03T06:07:43.5644099Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:43.5665424Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":11}}</json>
2023-08-03T06:07:48.5758663Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:48.5778206Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":9}}</json>
2023-08-03T06:07:53.5908531Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:53.5929789Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":10}}</json>
2023-08-03T06:07:58.5991923Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:07:58.6012322Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":9}}</json>
2023-08-03T06:08:03.6067454Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:08:03.6088604Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":10}}</json>
2023-08-03T06:08:08.6151974Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:08:08.6169995Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":8}}</json>
2023-08-03T06:08:13.6217815Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:08:13.6235685Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":9}}</json>
2023-08-03T06:08:18.6323500Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:08:18.6340839Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":9}}</json>
2023-08-03T06:08:23.6449728Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:08:23.6473115Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":12}}</json>
2023-08-03T06:08:28.6555996Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1PodAsync","success":true,"duration":null,"properties":{}}</json>
2023-08-03T06:08:28.6574931Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":9}}</json>
2023-08-03T06:08:33.0810392Z | RestorationJob | TRACE | AssemblyLoadContext unloading
2023-08-03T06:08:33.1427124Z | RestorationJob | TRACE | Process exiting...
2023-08-03T06:08:33.1430480Z | RestorationJob | TRACE | Restoration job cancelled
2023-08-03T06:08:33.1882152Z | RestorationJob | TRACE | Process exited |
we have 167 pods/services (all deployments 1/1) in the ns, maybe just taking too long to bootstrap?
|
@garyburgmann @praveensrao Thanks for the details, are you both using M1/M2 Mac (arm based one's). I was able to replicate the same problem with my personal mac, just trying to confirm on this fact. Please let me know. |
Intel -> Monterey |
I was able to run this on an M1 Pro MBP ... |
This is still happening. I can run $ dsc connect \
--local-port 3000 \
--routing foo \
--service bar \
--namespace bridge \
--use-kubernetes-service-environment-variables but the equivalent tasks.json fails {
"version": "2.0.0",
"tasks": [
{
"label": "bridge-to-kubernetes",
"type": "bridge-to-kubernetes.resource",
"resource": "bar",
"resourceType": "service",
"ports": [
3000
],
"targetCluster": "aks",
"targetNamespace": "bridge",
"useKubernetesServiceEnvironmentVariables": true,
"isolateAs": "foo"
}
]
} Error: Failed to establish a connection. Error: Failed to get routing manager deployment status - ran out of time : 'Failed to get routing manager deployment status: 'Oops... An unexpected error has occurred.'Please include the following Correlation ID when contacting support: '1a1b8c6b-d43c-43d8-aa9b-559ada2c7fd01696814560762:d21acbc92888'. ' By removing |
it all goes wrong here I believe if (!string.IsNullOrEmpty(_routingHeaderValue))
{
// If routing is enabled, wait until the routing manager is connected to say the connection has been established successfully.
var routingStatus = await routingManagementClient.GetStatusAsync(remoteAgentInfo.PodName, cancellationToken);
if (routingStatus.Value.IsConnected == null)
{
_log.Error(routingStatus.Value.ErrorMessage);
throw new UserVisibleException(_operationContext, Resources.FailedToGetRoutingManagerDeploymentStatusFormat, string.Format(CommonResources.CorrelationIdErrorMessageFormat, _operationContext.CorrelationId));
}
else if (routingStatus.Value.IsConnected == false)
{
throw new UserVisibleException(_operationContext, Resources.FailedToGetRoutingManagerDeploymentStatusRanOutOfTimeFormat, routingStatus.Value.ErrorMessage + string.Format(CommonResources.CorrelationIdErrorMessageFormat, _operationContext.CorrelationId));
}
}
results in But only when using the vscode extension, not when invoking |
@garyburgmann I will check on this and let you know. It happened to me in my mac as well but I couldn't reproduce it often to get the logs and see why it is happening. Basically routing manager pod will reach to dev host agent pod (this is envoy pod with your alias or username with it) and if that pod dies quicker this error can happen. Thanks for your patience. |
@garyburgmann I tried to replicate this, but I was not able to reproduce with latest release in mac. are you available for call tomorrow via slack, I am available via slack in the CNCF slack or kubernetes.slack.com. Please reach out to me. we can troubleshoot over the call. Thanks. |
@praveensrao @garyburgmann can you please test with latest version and see if this issue is resolved ? if not , we can see why it is still happening. Also to note, we are very small team just maintaining B2K, contributions are welcome. |
Describe the bug
Failed to establish a connection. Error: Failed to get routing manager deployment status - ran out of time : 'Failed to get routing manager deployment status: 'Oops... An unexpected error has occurred.'Please include the following Correlation ID when contacting support: '51998f9b-5d22-43fe-97d5-44a591e71d1f1689648728010:4ce86b08fabd'. '
This issue is inconsistent - it happens 70% of the time, and leading to a lot of developer frustration.
Mention the platform you are using
I am using VS Code Bridge to Kubernetes extension v2.0.120230716
To Reproduce
This issue occurs only when using the isolation mode.
Steps to reproduce the behavior:
Expected behavior
The Bridge to Kubernetes connection to the cluster must complete and the application must launch
Screenshots
After some time:
Logs:
bridge-to-k8s-logs.zip
routingmanager-deployment-bf984c7cc-7ktv7-logs.txt.zip
Desktop (please complete the following information):
I am on Mac OS Ventura 13.3.1 (a)
Using VS Code 1.80.0
Note, I've also tried this many times by cleaning up the deployments/services and pods in the cluster, but issue persists.
The text was updated successfully, but these errors were encountered: