-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Retry Logic #49
Comments
I'm trying to cause leader failover manually in redpanda to just observe what happens. I have a test where I'm creating 2 topics with replication 3, and I can see each topic has a different leader and has all 3 redpanda instances as replicas. Then I put a In the docker logs for redpanda-1 and redpanda-2, I'm seeing errors about leader election: Redpanda logs
Maybe this issue? That looks like it was fixed in redpanda 21.10.1, but this repo is using 21.9.2, is there a reason not to upgrade? (Latest looks like 21.11.2) Then when my test continues, it tries to request the topic metadata again and it just hangs. I'm going to try something similar with Kafka instead now that Marco's improved the Kafka compatibility, but am I on the right track for creating the conditions we need to test, redpanda bugs notwithstanding? Is there anything else I could try? |
@carols10cents it's safe to upgrade redpanda. To debug client issues just run the tests with |
And now all the integration tests on main are failing against Kafka with |
Do you get the |
Normal integration tests, not touching the docker processes. |
Could you please select a failing test and post the FULL output of the following command: env TEST_INTEGRATION=1 \
KAFKA_CONNECT=localhost:9093 \
RUST_BACKTRACE=1 \
RUST_LOG=trace \
cargo test -- <the test you chose> --nocapture I wanna figure out which action fails (backtrace) and what rskafka thinks is going one (logs). |
I can reproduce this issue now by using a different port (the 3 Kafka instances are mapped to three ports). It seems that I guess that you got unlucky because by luck this controller is normally the first node (since it starts first) but your restart tests changed it to a different node. I'll look into it. |
#60 should fix this issue. |
Currently there is no real testing of the retry logic, in particular leader failover, connection errors, etc... This should be addressed
The text was updated successfully, but these errors were encountered: