Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running into "[Errno 104] Connection reset by peer" error since upgraded from v.4.9.4 #25042

Open
luuu-xu opened this issue Jan 17, 2025 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature

Comments

@luuu-xu
Copy link

luuu-xu commented Jan 17, 2025

Issue Description

Our full-stack app has its FastAPI backend running on Linux server, it connects to our MySQL database.
Ever since our IT has upgraded podman from version 4.9.4 to 5.2.2, we have been seeing this "Lost connection" error while running multiple queries (reads and writes) within a single POST request from our frontend:

[ERROR] pid:35 (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query ([Errno 104] Connection reset by peer)')
(Background on this error at: https://sqlalche.me/e/20/e3q8)

This error happens randomly, but it seems like it will happen when the POST request takes more than 12 seconds or so (in that particular environment), while this single request has a lot of queries. But sometimes a long request can go through without running into this connection error.

After a lot of searches and different approaches, including:

  1. changing MySQL database's parameters like increasing read_timeout, write_timeout, increasing max_allowed_packet_size, etc.
  2. rebuilding and rebooting containers.
  3. openning up firewall setting to allow direct connection from laptop to linux server.
  4. changing SQLAlchemy's engine configuration like limiting connection pool to 1, turning on connection_pre_ping, etc.

Then we decided to rollback our podman version to 4.9.4 where before the errors started happening, no more errors!

Steps to reproduce the issue

Steps to reproduce the issue
1.
2.
3.

Describe the results you received

[ERROR] pid:35 (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query ([Errno 104] Connection reset by peer)')
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Describe the results you expected

The POST request will trigger a series of sql reads, computations and writes on backend, these queries should stay connected and run perfectly.

podman info output

host:
  arch: amd64
  buildahVersion: 1.33.11
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: c0564282e9befb7804c3642230f8e94f1b2ba9f8'
  cpuUtilization:
    idlePercent: 99.94
    systemPercent: 0.01
    userPercent: 0.05
  cpus: 32
  databaseBackend: boltdb
  distribution:
    distribution: rhel
    version: "9.5"
  eventLogger: file
  freeLocks: 2046
  hostname: ********
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.0-503.21.1.el9_5.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 257668243456
  memTotal: 266766888960
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.2-1.el9_5.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark-1.12.2-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.16.1-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.16.1
      commit: afa829ca0122bd5e1d67f1f38e6cc348027e3c32
      rundir: /run/user/1002/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240806.gee36266-2.el9.x86_64
    version: |
      pasta 0^20240806.gee36266-2.el9.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1002/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 8h 53m 55.00s (Approximately 0.33 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/****/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 2
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/****/.local/share/containers/storage
  graphRootAllocated: 321375940608
  graphRootUsed: 20980580352
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 4
  runRoot: /run/user/1002/containers
  transientStore: false
  volumePath: /home/****/.local/share/containers/storage/volumes
version:
  APIVersion: 4.9.4-rhel
  Built: 1730450505
  BuiltTime: Fri Nov  1 04:41:45 2024
  GitCommit: ""
  GoVersion: go1.21.13 (Red Hat 1.21.13-5.el9_4)
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.4-rhel

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

This version of podman works without errors.

Client:       Podman Engine
Version:      4.9.4-rhel
API Version:  4.9.4-rhel
Go Version:   go1.21.13 (Red Hat 1.21.13-5.el9_4)
Built:        Fri Nov  1 04:41:45 2024
OS/Arch:      linux/amd64

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@luuu-xu luuu-xu added the kind/bug Categorizes issue or PR as related to a bug. label Jan 17, 2025
@sbrivio-rh sbrivio-rh added the network Networking related issue or feature label Jan 17, 2025
@sbrivio-rh
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature
Projects
None yet
Development

No branches or pull requests

2 participants