Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket hangup error #299

Closed
avmey opened this issue Jun 10, 2024 · 6 comments · Fixed by #300
Closed

Socket hangup error #299

avmey opened this issue Jun 10, 2024 · 6 comments · Fixed by #300

Comments

@avmey
Copy link
Contributor

avmey commented Jun 10, 2024

In both testing and deployment, since upgrading to the beta 7.0.x, I've starting seeing socket hangup errors with larger datasets. Seems to affect both rasters and vectors.

In test:

Serialized Error: { errno: 'ECONNRESET', code: 'ECONNRESET', erroredSysCall: undefined }
This error originated in "src/functions/kelpSmoke.test.ts" test file. It doesn't mean the error was thrown inside the file itself, but while it was running.
Unhandled Rejection
FetchError: request to http://127.0.0.1:8080/kelp2016central_br.tif failed, reason: socket hang up
 ❯ ClientRequest.<anonymous> node_modules/node-fetch/src/index.js:108:11
 ❯ ClientRequest.emit node:events:518:28
 ❯ ClientRequest.emit node:domain:488:12
 ❯ Socket.socketOnEnd node:_http_client:524:9
 ❯ Socket.emit node:events:530:35
 ❯ Socket.emit node:domain:488:12
 ❯ endReadableNT node:internal/streams/readable:1696:12
 ❯ processTicksAndRejections node:internal/process/task_queues:82:21

In deployment:

Geoprocessing exception: FetchError: request to https://gp-california-reports-datasets.s3.us-west-1.amazonaws.com/rocky_shores.fgb failed, reason: socket hang up at ClientRequest.<anonymous> (file:///var/task/shoretypesHandler.mjs:953587:15) at ClientRequest.emit (node:events:519:28) at ClientRequest.emit (node:domain:488:12) at TLSSocket.socketOnEnd (node:_http_client:524:9) at TLSSocket.emit (node:events:531:35) at TLSSocket.emit (node:domain:488:12) at endReadableNT (node:internal/streams/readable:1696:12) at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
@twelch
Copy link
Contributor

twelch commented Jun 10, 2024

I noticed that the issue only seems to show up when I run all the smoke tests at once. if I use for example npm run test:matching 'kelpMax' the test will succeed but it fails in the bulk test. So possibly something about the number of similar requests getting fired off.

It's not clear to me that this is happening only for large datasources. For example the kelp regional rasters aren't particularly big and I'm getting the socket hang up error for thos.

This issue with node-fetch seems to match the issue but I've already upgraded to the latest node-fetch version that should have the fix -- node-fetch/node-fetch#1735.

I'm curious @avmey, do you see the errors in deployment (I'm guessing you mean you see it in the AWS logs) at the same frequency and for the same datasources as in the smoke tests?

@twelch
Copy link
Contributor

twelch commented Jun 10, 2024

node-fetch fixed its bug in 2.6.13 as well as 3.3.2.

I'm noticing that for the california project at least all of the socket hang ups except one are raster files. I wonder if it's an upstream issue.

Digging into the package-lock.json, it looks like geoblaze uses an older version of cross-fetch which uses node-fetch 2.6.12. I'm not sure this is the root cause but if so I'm not sure cross-fetch is actively maintained to fix it, because Node 22 removing the need for node-fetch and cross-fetch altogether but Lambda can't use Node v22 until November 2024 - lquixada/cross-fetch#183.

@twelch
Copy link
Contributor

twelch commented Jun 10, 2024

While cross-fetch may still suggest node-fetch ^2.6.12, since it's a fuzzy match, what actually gets installed for georaster is node-fetch 2.7.0 which should have the fix.

@avmey
Copy link
Contributor Author

avmey commented Jun 11, 2024

Here's a network that in deployment is experiencing the error. (The error is caught and displayed in the reports). On my end, kelp forests (raster) and shoreline habitats (vector) reports are both experiencing it, sometimes rock islands (vector) report as well. It's frequent - when I run a network there's always a couple errored reports.

@twelch
Copy link
Contributor

twelch commented Jun 11, 2024

It looks like Node 19 and above by default now has keepalive set to true for http module so that it reuses TCP connections using operating system sockets. This can cause socket hang up error if it tries to reuse a connection and it's already timed out, default 5 seconds.

https://connectreport.com/blog/tuning-http-keep-alive-in-node-js/

I'm going to start with trying to set keepalive to false, to see if that addresses it. And we can revisit how to add it back. I'm not sure that geoblaze provides access to passing fetch parameters but I can start with the flatgeobuf client.

Can then look at bringing back keepalive, if it can be configured properly. Node also possibly has maxSockets set to infinite by default.

@twelch
Copy link
Contributor

twelch commented Jun 11, 2024

Some additional info on configuring keepalive like timeout - https://betterstack.com/community/guides/scaling-nodejs/nodejs-errors/#2-econnreset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants