Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic applying virtual status patch #2293

Open
yankcrime opened this issue Nov 22, 2024 · 4 comments
Open

Panic applying virtual status patch #2293

yankcrime opened this issue Nov 22, 2024 · 4 comments
Labels

Comments

@yankcrime
Copy link

What happened?

I'm seeing vCluster panic when it's applying a virtual status patch as a result of updates to a LoadBalancer Service in a virtual cluster. The output is as follows:

2024-11-22 12:48:08     INFO    patcher/apply.go:313    Apply virtual patch     {"component": "vcluster", "controller": "service", "namespace": "cluster", "name": "kamaji-cluster", "reconcileID": "ca91da8b-dde2-462d-8233-cf4a3826a161", "kind": "Service", "object": "cluster/kamaji-cluster", "patch": "{\"metadata\":{\"annotations\":{\"loadbalancer.openstack.org/load-balancer-address\":\"193.16.42.10\",\"loadbalancer.openstack.org/load-balancer-id\":\"2239ddd9-1d22-4ff7-879e-a94e2278c45a\"}}}"}
2024-11-22 12:48:08     INFO    patcher/apply.go:313    Apply virtual status patch      {"component": "vcluster", "controller": "service", "namespace": "cluster", "name": "kamaji-cluster", "reconcileID": "4aeeb2ac-0f77-4a25-a3e2-8d4919b54ffc", "kind": "Service", "object": "cluster/kamaji-cluster", "patch": "{\"status\":{\"loadBalancer\":{\"ingress\":[{\"ip\":\"193.16.42.10\",\"ipMode\":\"VIP\"}]}}}"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      Observed a panic        {"component": "vcluster", "component": "controller-manager", "location": "panic.go:261", "panic": "runtime error: invalid memory address or nil pointer dereference", "panicGoValue": "\"invalid memory address or nil pointer dereference\"", "stacktrace": "<"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      goroutine 972 [running]:        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.logPanic({0x38483f0, 0x554eb20}, {0x2d9c260, 0x54897b0})   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xbc       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x38483f0, 0x554eb20}, {0x2d9c260, 0x54897b0}, {0x554eb20, 0x0, 0x43d945?})   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:82 +0x5e        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0019e5dc0?})     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x108       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      panic({0x2d9c260?, 0x54897b0?}) {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      runtime/panic.go:770 +0x132     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service.(*Controller).needsUpdate(0xc000d49a00, 0xc002202008, 0xc002027688)   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service/controller.go:581 +0x4b8      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service.New.func2({0x32711e0?, 0xc002202008?}, {0x32711e0, 0xc002027688?})    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service/controller.go:144 +0x74       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/controller.go:253  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    patcher/apply.go:313    Apply host patch        {"component": "vcluster", "controller": "service", "namespace": "cluster", "name": "kamaji-cluster", "reconcileID": "e32f57b2-f56a-4fcb-ba94-4e347e2dd067", "kind": "Service", "object": "kamaji/kamaji-cluster-x-cluster-x-clustermgr0", "patch": "{\"spec\":{\"loadBalancerIP\":\"193.16.42.10\"}}"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.(*processorListener).run.func1()   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/shared_informer.go:976 +0xea       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0019f7f70, {0x38135c0, 0xc0019cb530}, 0x1, 0xc001997e60)      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001cea770, 0x3b9aca00, 0x0, 0x1, 0xc001997e60) {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.Until(...)    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:161        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.(*processorListener).run(0xc000d07200)     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 833    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      >       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      panic: runtime error: invalid memory address or nil pointer dereference [recovered]     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      panic: runtime error: invalid memory address or nil pointer dereference {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x211fc38]        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      goroutine 972 [running]:        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x38483f0, 0x554eb20}, {0x2d9c260, 0x54897b0}, {0x554eb20, 0x0, 0x43d945?})   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:89 +0xee        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0019e5dc0?})     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x108       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      panic({0x2d9c260?, 0x54897b0?}) {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      runtime/panic.go:770 +0x132     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service.(*Controller).needsUpdate(0xc000d49a00, 0xc002202008, 0xc002027688)   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service/controller.go:581 +0x4b8      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service.New.func2({0x32711e0?, 0xc002202008?}, {0x32711e0, 0xc002027688?})    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service/controller.go:144 +0x74       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/controller.go:253  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.(*processorListener).run.func1()   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/shared_informer.go:976 +0xea       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc002571f70, {0x38135c0, 0xc0019cb530}, 0x1, 0xc001997e60)      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001cea770, 0x3b9aca00, 0x0, 0x1, 0xc001997e60) {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.Until(...)    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:161        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.(*processorListener).run(0xc000d07200)     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 833    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:12     INFO    commandwriter/commandwriter.go:128      Failed to list /registry/ipam.cluster.x-k8s.io/ipaddresses/ for revision 204148: rpc error: code = OutOfRange desc = etcdserver: mvcc: required revision has been compacted       {"component": "vcluster", "component": "kine", "time": "2024-11-22T12:48:12.207567635Z", "level": "error"}

At this point no resources in the virtual cluster are synchronised, however if I kill the Pod for this virtual cluster then eventually everything restarts and resources start synchronising again.

What did you expect to happen?

I do not expect vCluster to panic, at worst an error but a yield so that resources continue to synchronise so that further actions in the virtual cluster aren't blocked.

How can we reproduce it (as minimally and precisely as possible)?

This is being triggered when creating a Kamaji TenantControlPlane resource in my virtual cluster. As part of this resource's creation it creates a Service of type LoadBalancer, and once this has been instantiated that's when I see the panic in vCluster. It's consistent and repeatable albeit a bit involved, happy to provide more details to assist with troubleshooting if it's not clear from the trace.

Anything else we need to know?

No response

Host cluster Kubernetes version

$ kubectl version
Client Version: v1.30.5
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.31.2

vcluster version

$ vcluster --version
vcluster version 0.21.1

VCluster Config

Nothing special, the virtual cluster was created with the following command:

vcluster create clustermgr0 -n kamaji --expose
@FabianKramm
Copy link
Member

@yankcrime thanks for reporting this! Strange, the panic actually occurs in the Kubernetes controller-manager itself, this might be a Kubernetes problem or something that has to do with our configuration

@yankcrime
Copy link
Author

Thanks @FabianKramm - it's an interesting one for sure. Let me know if I can provide you with any more details 👍

@yankcrime
Copy link
Author

An additional datapoint: I tested this today with an older version of vCluster - v0.19.6 - and didn't experience the same panic.

@kale-amruta
Copy link

@yankcrime I tried installing Kamaji Tenant control plane resource on host cluster without vcluster being installed on host cluster and I got same error in the kube-controller-manager pod of host cluster. I used this sample which uses k8s version 1.30.0 to create the tenant controlplane resource on the host cluster.

I am suspecting this is an issue with the kamaji and not vcluster since I ran into the same issue without vcluster. This could be because of k8s server version incompatibility between host and kamaji tenant control plane, and its possible that kamaji is yet not supported fully on 1.31.2.

The error that was observed in kamaji controller when the kamaji tenant control plane resource was created:

8T07:46:39Z	INFO	Starting workers	{"controller": "secret", "controllerGroup": "", "controllerKind": "Secret", "worker count": 1}
2025-01-08T07:47:40Z	ERROR	cannot retrieve Tenant Control Plane address	{"controller": "tenantcontrolplane", "controllerGroup": "kamaji.clastix.io", "controllerKind": "TenantControlPlane", "TenantControlPlane": {"name":"k8s-130","namespace":"default"}, "namespace": "default", "name": "k8s-130", "reconcileID": "25ed01e2-afb3-42c6-a555-626a75922417", "resource": "service", "error": "cannot retrieve the TenantControlPlane address, Service resource is not yet exposed as LoadBalancer"}
github.com/clastix/kamaji/internal/resources.(*KubernetesServiceResource).UpdateTenantControlPlaneStatus
	/workspace/internal/resources/k8s_service_resource.go:52
github.com/clastix/kamaji/controllers/utils.UpdateStatus.func1
	/workspace/controllers/utils/update_status.go:26
k8s.io/client-go/util/retry.OnError.func1
	/go/pkg/mod/k8s.io/client-go@v0.30.2/util/retry/util.go:51
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection
	/go/pkg/mod/k8s.io/apimachinery@v0.30.2/pkg/util/wait/wait.go:145
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
	/go/pkg/mod/k8s.io/apimachinery@v0.30.2/pkg/util/wait/backoff.go:461
k8s.io/client-go/util/retry.OnError
	/go/pkg/mod/k8s.io/client-go@v0.30.2/util/retry/util.go:50
k8s.io/client-go/util/retry.RetryOnConflict
	/go/pkg/mod/k8s.io/client-go@v0.30.2/util/retry/util.go:104
github.com/clastix/kamaji/controllers/utils.UpdateStatus
	/workspace/controllers/utils/update_status.go:19
github.com/clastix/kamaji/controllers.(*TenantControlPlaneReconciler).Reconcile
	/workspace/controllers/tenantcontrolplane_controller.go:195
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222
2025-01-08T07:47:40Z	ERROR	update of the resource failed	{"controller": "tenantcontrolplane", "controllerGroup": "kamaji.clastix.io", "controllerKind": "TenantControlPlane", "TenantControlPlane": {"name":"k8s-130","namespace":"default"}, "namespace": "default", "name": "k8s-130", "reconcileID": "25ed01e2-afb3-42c6-a555-626a75922417", "resource": "service", "error": "error applying TenantcontrolPlane status: cannot retrieve the TenantControlPlane address, Service resource is not yet exposed as LoadBalancer"}

Questions:

  1. Do you get similar panic when installing kamaji on host cluster without vcluster running?
  2. What version of k8s was used to install Kamaji Tenant controlplane resource?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants