-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bolt + Flask + Kubernetes inevitably starts throwing WebSocketConnectionClosedException #445
Comments
Hi @IanWhalen, thanks for sharing the details of the issue. All the information in the description are really helpful. But unfortunately, I'm also unsure what the root cause of the issue is at this moment.
If you don't mind my asking, would it be possible to ask you to check if the same situation can arise with a simpler app in your k8s cluster? I'm guessing that gunicorn + web framework might not be the contributing factors of the failure of reconnection in this scenario. If so, just running a simple WebSocket client process can reproduce the same situation. With the following example, you can see more detailed information about the app (debug level logging + trace_enabled option) # app.py
# Prints debug-level logs
import logging
logging.basicConfig(level=logging.DEBUG)
import os
from threading import Event
from slack_sdk.web import WebClient
from slack_sdk.socket_mode.response import SocketModeResponse
from slack_sdk.socket_mode.request import SocketModeRequest
from slack_sdk.socket_mode.websocket_client import SocketModeClient
client = SocketModeClient(
app_token=os.environ.get("SLACK_APP_TOKEN"),
web_client=WebClient(token=os.environ.get("SLACK_BOT_TOKEN")),
trace_enabled=True, # print more details happening under the hood
)
if __name__ == "__main__":
def process(client: SocketModeClient, req: SocketModeRequest):
if req.type == "events_api":
response = SocketModeResponse(envelope_id=req.envelope_id)
client.send_socket_mode_response(response)
client.socket_mode_request_listeners.append(process)
client.connect()
Event().wait()
# Dockerfile
# docker build . -t your-repo/hello-bolt
# docker run -e SLACK_APP_TOKEN=$SLACK_APP_TOKEN -e SLACK_BOT_TOKEN=$SLACK_BOT_TOKEN -it your-repo/hello-bolt
#
FROM python:3.9.5-slim-buster
WORKDIR /app/
COPY *.py /app/
COPY requirements.txt /app/
RUN pip install -U pip && pip install -r requirements.txt
ENTRYPOINT python app.py If your Docker containers require a healthcheck endpoint, please go with the following example: def process(client: SocketModeClient, req: SocketModeRequest):
if req.type == "events_api":
response = SocketModeResponse(envelope_id=req.envelope_id)
client.send_socket_mode_response(response)
client.socket_mode_request_listeners.append(process)
client.connect()
from flask import Flask
flask_app = Flask(__name__) This example requires flask, gunicorn in If we figure out that there is an issue (or room for improvement for stability) on this library side, we will fix the issue as quickly as possible. |
Thanks @seratch. I'm going to investigate trying to make the minimum viable reproducer with the pointers you provided. In the meantime I think I have found a local reproducer - or at least I've found a way to reproduce the symptoms - which might indicate this isn't necessarily a container issue. This is going to sound crazy, but I can very reliably trigger this error by:
The message will successfully post, but I'll get the following when I have trace_enabled for 2021-08-19 13:36:48.159562 [info ] normal app logging message here
2021-08-19 13:36:48.213292 [info ] another normal app logging message here
# see timestamps - paused on breakpoint.
2021-08-19 13:38:01.545749 [debug ] ++Sent raw: b'\x89\x80\xe7\xd8n}'
2021-08-19 13:38:01.546778 [debug ] ++Sent decoded: fin=1 opcode=9 data=b''
2021-08-19 13:38:01.565154 [debug ] ++Rcv raw: b'\x89)Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.565405 [debug ] ++Rcv decoded: fin=1 opcode=9 data=b'Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.567176 [debug ] ++Sent raw: b'\x8a\xa9\x03;e\x91SR\x0b\xf6#]\x17\xfen\x1b\x04\xe1sW\x0c\xffh\x16\x06\xf0mZ\x17\xe8.\x0eQ\xa74X\x01\xf2;\r]\xbcd\tP\xf9p'
2021-08-19 13:38:01.567456 [debug ] ++Sent decoded: fin=1 opcode=10 data=b'Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.572026 [debug ] ++Rcv raw: b'\x89)Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.572865 [debug ] ++Rcv decoded: fin=1 opcode=9 data=b'Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.574623 [debug ] ++Sent raw: b'\x8a\xa9\x08*\x16HXCx/(Ld\'e\nw8xF\x7f&c\x07u)fKd1%\x1f"~?Ir+0\x1c.eo\x18# {'
2021-08-19 13:38:01.574990 [debug ] ++Sent decoded: fin=1 opcode=10 data=b'Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.580777 [debug ] ++Rcv raw: b'\x89)Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.581151 [debug ] ++Rcv decoded: fin=1 opcode=9 data=b'Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.581619 [debug ] ++Sent raw: b'\x8a\xa9\xa5\xc3\xc3\xf5\xf5\xaa\xad\x92\x85\xa5\xb1\x9a\xc8\xe3\xa2\x85\xd5\xaf\xaa\x9b\xce\xee\xa0\x94\xcb\xa2\xb1\x8c\x88\xf6\xf7\xc3\x92\xa0\xa7\x96\x9d\xf5\xfb\xd8\xc2\xf1\xf6\x9d\xd6'
2021-08-19 13:38:01.582048 [debug ] ++Sent decoded: fin=1 opcode=10 data=b'Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.582816 [debug ] ++Rcv raw: b'\x89)Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.583121 [debug ] ++Rcv decoded: fin=1 opcode=9 data=b'Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.583711 [debug ] ++Sent raw: b'\x8a\xa9Rt\xa6\x0f\x02\x1d\xc8hr\x12\xd4`?T\xc7\x7f"\x18\xcfa9Y\xc5n<\x15\xd4v\x7fA\x929e\x17\xc2ljB\x9e"5F\x93g!'
2021-08-19 13:38:01.584062 [debug ] ++Sent decoded: fin=1 opcode=10 data=b'Ping from applink-canary-5467cdc868-g25hs'
2021-08-19 13:38:01.585277 [error ] on_error invoked (error: WebSocketConnectionClosedException, message: Connection to remote host was lost.)
2021-08-19 13:38:01.935678 [error ] Traceback (most recent call last):
File "/Users/ian/10gen/mothra/foo.py", line 23, in bar
slack.chat_postMessage(
File "/Users/ian/code/10gen/mothra/.venv/lib/python3.9/site-packages/slack_sdk/web/client.py", line 1266, in chat_postMessage
return self.api_call("chat.postMessage", json=kwargs)
File "/Users/ian/code/10gen/mothra/.venv/lib/python3.9/site-packages/slack_sdk/web/base_client.py", line 135, in api_call
return self._sync_send(api_url=api_url, req_args=req_args)
File "/Users/ian/code/10gen/mothra/.venv/lib/python3.9/site-packages/slack_sdk/web/base_client.py", line 171, in _sync_send
return self._urllib_api_call(
File "/Users/ian/code/10gen/mothra/.venv/lib/python3.9/site-packages/slack_sdk/web/base_client.py", line 303, in _urllib_api_call
return SlackResponse(
File "/Users/ian/code/10gen/mothra/.venv/lib/python3.9/site-packages/slack_sdk/web/slack_response.py", line 204, in validate
raise e.SlackApiError(message=msg, response=self)
slack_sdk.errors.SlackApiError: The request to the Slack API failed.
The server responded with: {'ok': False, 'error': 'channel_not_found'}
--- request header ---
2021-08-19 13:38:02.520684 [debug ] --- request header ---
GET /link/?ticket=ef1b4d46-e81b-4268-8f32-a984d529ef7f&app_id=fa84b3a846b741569a6cbf95af546404fceb70f9deb34bb48e2e384da83faaa2 HTTP/1.1
Upgrade: websocket
Host: wss-primary.slack.com
Origin: http://wss-primary.slack.com
Sec-WebSocket-Key: gYz1neO156Ngg+45IVaCkQ==
Sec-WebSocket-Version: 13
Connection: Upgrade
2021-08-19 13:38:02.521086 [debug ] GET /link/?ticket=ef1b4d46-e81b-4268-8f32-a984d529ef7f&app_id=fa84b3a846b741569a6cbf95af546404fceb70f9deb34bb48e2e384da83faaa2 HTTP/1.1
Upgrade: websocket
Host: wss-primary.slack.com
Origin: http://wss-primary.slack.com
Sec-WebSocket-Key: gYz1neO156Ngg+45IVaCkQ==
Sec-WebSocket-Version: 13
Connection: Upgrade
2021-08-19 13:38:02.521480 [debug ] -----------------------
2021-08-19 13:38:02.521874 [debug ] --- response header ---
2021-08-19 13:38:02.541301 [debug ] HTTP/1.1 101 Switching Protocols
2021-08-19 13:38:02.541848 [debug ] upgrade: websocket
2021-08-19 13:38:02.542421 [debug ] connection: upgrade
2021-08-19 13:38:02.543265 [debug ] sec-websocket-accept: jBgAPGve7avK99CAimVrhq4qjkk=
2021-08-19 13:38:02.543854 [debug ] date: Thu, 19 Aug 2021 13:38:02 GMT
2021-08-19 13:38:02.544530 [debug ] server: envoy
2021-08-19 13:38:02.545074 [debug ] via: envoy-wss-iad-crpa
2021-08-19 13:38:02.545351 [debug ] -----------------------
2021-08-19 13:38:02.552737 [debug ] ++Rcv raw: b'\x81\x1cUNAUTHENTICATED: cache_error'
2021-08-19 13:38:02.553021 [debug ] ++Rcv decoded: fin=1 opcode=1 data=b'UNAUTHENTICATED: cache_error'
2021-08-19 13:38:07.569974 [error ] on_error invoked (error: WebSocketConnectionClosedException, message: Connection to remote host was lost.) Interestingly the logs will continue to spam the Is that at all helpful? edit: also of note is that I'm just running |
@IanWhalen Thanks for sharing this. I can check this tomorrow in my timezone. But one thing I'm wondering is that this may not be a reproducer of the issue that you mentioned in the description. My understanding of this is that, as you paused the app for a while, the underlying WebSocket client lost the connection and it automatically reconnected after the pause. Indeed, this is not a normal situation but the Socket Mode client has successfully recovered without restarting it. Also, I'm curious that what happens if you do the same experiment with the built-in Socket Mode client.
Can you elaborate this part in your description? Does this mean your production app failed to auto reconnect when the exception was thrown? Or, while you found the error logs, did the app continue working after that (=auto-reconnection still works)? |
Also, if the only reason that you don't use the built-in client is noisy |
Hi @IanWhalen, I'm not sure if you're still checking this issue but recently we've applied a few minor improvements to the websocket-client module based Socket Mode client. I don't think that those updates can eliminate the error you've faced but you may see some difference with the latest version (slack-sdk 3.11.2, slack-bolt 1.9.1 as of today). |
Thanks @seratch. I'm reading through the release notes now. Am I correct that https://github.com/slackapi/python-slack-sdk/pull/1114/files is where you made most of the improvements? And given the flow of this conversation, I think you can feel pretty good about resolving this issue for now and we'll try out the updated versions. |
@IanWhalen Sorry for my very late response here. I don't think the diff you've mentioned can eliminate your issue here but trying the latest versions of both slack-bolt and websocket-client may be worth trying. I hope the error will be resolved for you soon. |
@IanWhalen : Did upgrading the versions resolved your issue. I am facing a similar issue. When i am only running flask app on kubernetes pod, it runs fine but when i run bolt+flask app, it is throwing exception |
Running a bolt app using socket mode inside Flask inside kubernetes works initially but eventually always loses the connection and falls back to a
WebSocketConnectionClosedException
error.Given that auto_reconnect_enabled defaults to True, I would expect any failures to just result in the app reconnecting.
I'm opening this as a question as I'm highly doubtful its an actual bug, and instead just something I need to do differently/better in my own app code.
Reproducible in:
The
slack_bolt
versionslack-bolt = "1.6.0"
slack-sdk = "3.8.0"
websocket-client = "1.1.0"
Python runtime version
python3.7
OS info
Problem is seen running in a container.
Steps to reproduce:
I've tried to emulate the pattern in #255 for running bolt + slack, so a simplified version of my app looks like this:
I doubt the
BoltUnhandledRequestError
is causing this but included it in my example code just in case.Maybe of note is that i'm using
websocket_client
based on the suggestion in slackapi/python-slack-sdk#1024. We were seeing the sameBlockingIOError
logs.Also maybe of note is that in #255 you suggest using two threads for gunicorn and we are just currently running with:
Lastly of note is that I am unable to repro this problem locally, and I'm just seeing it inside of our kubernetes cluster. Unfortunately I'm not savvy enough to know how to debug whether the k8s infra is causing my problem (although I am simultaneous to filing this issue working with the people who maintain that infra to investigate from that side).
Expected result:
My slack connection doesn't die.
Actual result:
My app connects fine initially, but after some period of time disconnects from slack and the logs quickly degenerate into the following error every 5 seconds:
on_error invoked (error: WebSocketConnectionClosedException, message: Connection to remote host was lost.)
The text was updated successfully, but these errors were encountered: