-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peer already exists error in MetalLb #427
Comments
Hi @schrodit! Is this causing your cluster and/or the applications running in it to malfunction in some way, or is the log message the only observable outcome? Did this issue happen after upgrading CPEM in an existing cluster, or is this happening in a new cluster that was set up from the beginning with the software versions listed in the issue description? |
Hey @ctreatma ,
the cpem and metallb were updated from metallb v9.5 and equinix ccm 3.3.0.
An issue is that the bgp routes of some nodes are not correctly updated. All speaker pods were no routes are assigned have the saem/similöar error message:
Non-of them serving the ip. |
Could you provide the config for the other peers as well? Are there 2 identical peer configs in metallb-system/config? |
I think its the best if provide the whole configmap: MetalLb Configmap
|
We now had to completly roll back to equinix ccm v3.3.0 and metallb v0.9.5 as all bgp routes were deleted. After the rollback everything was fine again |
Hey @schrodit few questions for you:
For @ctreatma it looks like the config map they ended up with had 48 entries with 12 unique hostnames, so four entries each. |
Hey @cprivitere ,
we do not use a configmap but rather the env var approach.
yes like it currently does (becasue we had to roll back):
I think it did that when the new ccm was started. This was because first we used metallb 0.9.5 with ccm 3.6.2 which resulted in an error in the metallb (source-address was unknown). After updating the config was correctly parsed but the described error occurred.
We are not tight to version 0.12.1. But I guess the CRD approach needs more testing. And the question is if it will work better. Maybe another sidenote to our setup: Do you have an Idea how we ended up there and how we can upgrade. |
Are you saying that this works properly in a fresh environment so the issue here has to do with upgrading? We don't have a lot of testing or feedback about the upgrade process. Is it possible to instead plan to build a fresh cluster and migrate the applications? I feel like a fresh cluster with 3.6.2 and MetalLB 0.13.X would be an easier route. |
It works in my simple test. But I'm not sure how to reproduce the situation in our prod cluster.
This is unfortunately not possible. We have customers running on that cluster and can not simple migrate them. @cprivitere do you have an idea why the ccm generates a invalid metallb config? |
@schrodit |
Ok we did some attempts to reproduce this and while we can't get the same exact issue you ran into, we did see several issues come up depending on how one tries to upgrade. Here's what we've found to be the best way to upgrade from MetalLB 0.9.5 and CPEM 3.3.0 to MetalLB 0.12.1 and CPEM 3.6.2. The biggest finding was that the Metal LB config map needs to be cleaned up to get rid of older entries that aren't formatted properly, CPEM won't over write these so the only option is to delete them. This assumes your metallb lives in the metallb-system namespace and the metallb configmap is named "config".
Are you able to test this out and see if the load balancers come up following your upgrade? |
Hi @schrodit, complementing @cprivitere ' comment and for other people seeking this information, during the tests, we found that version Metallb 0.9.5 doesn't work with CPEM 3.6.2, so upgrading both is necessary to ensure it works well. We also noticed that it is possible to upgrade them without deleting the configmap. However, we still recommend doing so when possible, as we encountered duplicated fields as in the config you shared when we perform multiple upgrades/rollbacks between different versions where changes were made to some data types and fields, and they were interpreted as different |
@cprivitere @ocobles thanks for the input. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@cprivitere: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The metallb of our Kubernetes cluster throws that error on startup:
the respective peer config is:
In the equinix metalconsole I also see that we 2 nodes without a learned bgp route (might be related).
Environment:
K8s: 1.22.17
Equinix CCM: v3.6.2
MetalLb: 0.12.1
The text was updated successfully, but these errors were encountered: