You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After service runs for extended amount of time we start receiving following error
ERROR 065301be-28ca-7efa-8000-9b486b83e4ea : AsyncGRPCClient@1192118 gRPC error: StatusCode.UNAVAILABLE keepalive watchdog timeout [10/18/23 12:55:14]
None
The ongoing request is terminated as the server is not available or closed already.
ERROR 065301be-28ca-7efa-8000-9b486b83e4ea : MARIE@1192118 Error: gRPC error: StatusCode.UNAVAILABLE keepalive watchdog timeout [10/18/23 12:55:14]
None
Traceback (most recent call last):
File "/home/gbugaj/dev/marieai/marie-ai/marie/clients/base/grpc.py", line 142, in _get_results
async for response in stream_rpc.stream_rpc_with_retry():
File "/home/gbugaj/dev/marieai/marie-ai/marie/clients/base/stream_rpc.py", line 51, in stream_rpc_with_retry
async for resp in stub.Call(
File "/home/gbugaj/environments/pytorch2-3.10/lib/python3.10/site-packages/grpc/aio/_call.py", line 326, in _fetch_stream_responses
await self._raise_for_status()
File "/home/gbugaj/environments/pytorch2-3.10/lib/python3.10/site-packages/grpc/aio/_call.py", line 236, in _raise_for_status
raise _create_rpc_error(await self.initial_metadata(), await
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "keepalive watchdog timeout"
debug_error_string = "UNKNOWN:Error received from peer ipv4:0.0.0.0:52000 {created_time:"2023-10-18T12:55:14.148968042-05:00", grpc_status:14, grpc_message:"keepalive watchdog
timeout"}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/gbugaj/dev/marieai/marie-ai/marie_server/rest_extension.py", line 304, in process_document_request
async for resp in client.post(
File "/home/gbugaj/dev/marieai/marie-ai/marie/clients/mixin.py", line 497, in post
async for result in c._get_results(
File "/home/gbugaj/dev/marieai/marie-ai/marie/clients/base/grpc.py", line 169, in _get_results
await self._handle_error_and_metadata(err)
File "/home/gbugaj/dev/marieai/marie-ai/marie/clients/base/grpc.py", line 188, in _handle_error_and_metadata
raise ConnectionError(msg)
ConnectionError: gRPC error: StatusCode.UNAVAILABLE keepalive watchdog timeout
The text was updated successfully, but these errors were encountered:
What is the maxAttempts set to in the grpc backoff strategy?
Based on the docs here, the max attempts value includes the original request itself.
It looks like it is currently being defaulted to 1 based on looking at the AsyncPostMixin, which means no retries after the initial attempt. Bumping this value might fix this.
After service runs for extended amount of time we start receiving following error
The text was updated successfully, but these errors were encountered: