Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client Error Handler Called in Tight Loop on Media Driver Shutdown in v1.44.6 #1712

Closed
jack-pearce opened this issue Jan 14, 2025 · 1 comment

Comments

@jack-pearce
Copy link

Description

When upgrading from Aeron v1.41.3 to v1.44.6, we observed a change in the behavior of the Client Conductor's error handling.

In v1.41.3, when the Media Driver (aeronmd) shuts down, the error_handler is called every 0.5 seconds to report the lost connection. However, in v1.44.6, the error_handler is called in a tight loop, resulting in a continuous stream of "MediaDriver has been shutdown" messages. This creates excessive log output (we log within the error_handler)

Questions

1). Is this change intentional or are we doing something wrong?

2). How can we reduce the frequency of error_handler calls?

  • Is there a way to apply a backoff strategy to the frequency of error_handler calls when the Media Driver goes down?
  • Alternatively, is the expectation that such a backoff strategy should be implemented by the user within the error_handler?

Steps to Reproduce

1). Start a local Media Driver instance with the following script:

#!/usr/bin/env bash
set -euo pipefail

export TMPDIR=/path/to/dir/
run-aeron.sh \
-Daeron.dir=$TMPDIR  \
-Daeron.archive.dir=$TMPDIR  \
-Daeron.dir.delete.on.start=false  \
-Daeron.spies.simulate.connection=true  \
-Daeron.print.configuration=true  \
-Daeron.archive.max.concurrent.recordings=1  \
-Daeron.archive.control.channel=aeron:udp?endpoint=0.0.0.0:8010  \
-Daeron.archive.replication.channel=aeron:udp?endpoint=127.0.0.1:0 \
-Daeron.archive.control.response.channel=aeron:udp?endpoint=127.0.0.1:0 io.aeron.archive.ArchivingMediaDriver

2). Run the following application code:

#include <Aeron.h>

void error_handler(const std::exception& exception)
{
    std::cout << "[critical] Aeron error: " << exception.what() << std::endl;
}

int main()
{
    ::aeron::Context context;
    context.aeronDir("/path/to/dir/");
    context.errorHandler(error_handler);
    auto aeron = ::aeron::Aeron::connect(context);

    std::cout << "Aeron client connected. Waiting for Media Driver shutdown..." << std::endl;
    while (true)
    {
        std::cout << "Doing work..." << std::endl;
        std::this_thread::sleep_for(std::chrono::seconds(1));
    }
}

3). Kill the Media Driver

4). Observe continuous stream of error messages:

[critical] Aeron error: MediaDriver has been shutdown
[critical] Aeron error: MediaDriver has been shutdown
[critical] Aeron error: MediaDriver has been shutdown
...

Greatly appreciate any help or advice here!

@vyazelenko
Copy link
Contributor

Should be fixed with cc2d0df.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants