I have kubernetes pods for one node app and each is crashing every 10 minutes or so, and I'd like to understand why and stabilize it.
the pods: $ k get po | grep app
app-655fd5fcc9-4mtjr 0/1 CrashLoopBackOff 53 7h35m
app-655fd5fcc9-6kf82 1/1 Running 106 16h
app-655fd5fcc9-9tfbp 1/1 Running 87 16h
app-655fd5fcc9-g8x7q 1/1 Running 53 7h35m
app-655fd5fcc9-nvcc8 1/1 Running 102 16h
the logs right before crashing: $ k logs -p app-655fd5fcc9-4mtjr
node[25]: ../src/node_http2.cc:893:ssize_t node::http2::Http2Session::ConsumeHTTP2Data(): Assertion `(flags_ & SESSION_STATE_READING_STOPPED) != (0)' failed.
1: 0x8fa0c0 node::Abort() [node]
2: 0x8fa195 [node]
3: 0x959e02 node::http2::Http2Session::ConsumeHTTP2Data() [node]
4: 0x959f4f node::http2::Http2Session::OnStreamRead(long, uv_buf_t const&) [node]
5: 0xa2aad1 node::TLSWrap::ClearOut() [node]
6: 0xa2b343 node::TLSWrap::OnStreamRead(long, uv_buf_t const&) [node]
7: 0x9cf801 [node]
8: 0xa7ae09 [node]
9: 0xa7b430 [node]
10: 0xa80dd8 [node]
11: 0xa6fe6b uv_run [node]
12: 0x904725 node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&) [node]
13: 0x90297f node::Start(int, char**) [node]
14: 0x7f1a8cbd02e1 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
15: 0x8bbe85 [node]
Aborted (core dumped)
npm ERR! code ELIFECYCLE
npm ERR! errno 134
npm ERR! app@1.0.1 start: `node --harmony ./entry-point.js "--max-old-space-size=7168"`
npm ERR! Exit status 134
npm ERR!
npm ERR! Failed at the app@1.0.1 start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2020-03-12T00_45_17_556Z-debug.log
I read through the $ k describe pods app-655fd5fcc9-4mtjr
but there didn't seem to be any relevant helpful info at a glance. I think the issue is with the app anyways.
Where do I begin to start to debug and solve this?
node entry-point.js
directly locally for some time? It's production code, but sometimes you got to run stuff locally.$ k exec -it app-655fd5fcc9-6kf82 top
as it went into CrashLoopBackOff state and the resource usage seemed fine.My app isn't using node stdlib, http2
directly. There might be some npm module like @google-cloud
modules or one of the http request clients. $ ack http2 --js # no results
Don't know if it helps someone but for me I was using node v10.16.3
I was facing a similar issue but after moving to v12.14.1
it stopped popping up.
Not sure what exactly could be the cause. But my application is running a loop on a very large array due to which I had been manually running the Garbage cleaner after processing a few chunks. And the above error was popping up after my first cleaning process.
The issue was with the app after all. We had old legacy code that ran this func with deeply nested callbacks using polling. It's been refactored to make the func async and do all work in parallel w/ limited throughput, and changing the controller to just await each func call.
The pods are crashing every 1-3 hours instead of every 10 mins. Probably another issue w/ the app.