问题简述: 今天凌晨0点3台应用机器cpu爆炸, 持续了几个小时(之后也不是应用自己恢复正常的). 猜想是不是跟日志有关? 前一天清空了应用日志目录. 机器配置: 4cpu x 3台k8s节点机器. 应用规模: 3台机器跑了大约50个egg k8s pod, 每个应用跑2个pod. 日志写入: 如果一个应用的2个pod落在同一个机器上, 会写相同的日志文件. (单个pod实例里4个egg worker是不是也存在这样的情况?) 启动方式: pod里面用egg-scripts start --daemon运行起来的.
初步排查结果, cpu暴涨的node进程都是egg worker:
1111 ? Rl 0:13 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":35401}
1566 ? Rl 0:06 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":39980}
1610 ? Rl 0:10 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":42591}
1667 ? Sl 0:07 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":46809}
2222 ? Rl 0:06 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/agent_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":35401}
2349 ? Rl 0:04 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":46809}
2529 ? Rl 0:04 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/agent_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":42591}
9716 ? Rl 3:28 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":39980}
18599 ? Rl 5:36 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/agent_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":43367}
18768 ? Rl 5:31 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":43367}
18781 ? Rl 5:37 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":43367}
18783 ? Rl 7:42 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/agent_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":37730}
18871 ? Rl 7:04 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":37730}
18873 ? Rl 7:11 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":37730}
18878 ? Rl 7:08 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":37730}
18888 ? Rl 6:58 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":37730}
19292 ? Rl 6:04 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/agent_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":42728}
19324 ? Rl 5:05 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":42728}
19325 ? Rl 5:06 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":42728}
19331 ? Rl 5:04 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":42728}
19344 ? Rl 5:01 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":42728}
20639 ? Rl 4:58 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":39980}
21635 ? Rl 2:50 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":39980}
26413 ? Rl 5:29 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/agent_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":39980}
31557 ? Rl 6:56 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/agent_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":45792}
31580 ? Rl 6:27 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":45792}
31582 ? Rl 6:24 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":45792}
31584 ? Rl 6:22 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":45792}
31596 ? Rl 6:29 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":45792}
32705 ? Rl 0:18 /usr/local/bin/node --require /usr/local/lib/node_modules/egg-scripts/node_modules/source-map-support/register.js /app/node_modules/egg-cluster/lib/app_worker.js {"framework":"/app/node_modules/egg","baseDir":"/app","workers":4,"plugins":null,"https":false,"title":"*************","clusterPort":35401}
这个只能你找一天再次复现下,然后当场用 alinode 抓下 Profile 看看了。 从给出的信息无法分析。
另外,在 docker 里面 egg-scripts start --daemon
的话,不就挂了?不应该加 daemon 后台运行的。
@atian25 感谢回复, 对的没错启动命令是egg-scripts start --title=$PROJECT, 年代久远忘了. 请问下egg如何关闭多进程模式, 本来就起了多个pod, 也开了自动扩容. 不需要内部多进程了.
有个 https://github.com/eggjs/egg/issues/3180 但还在 beta 阶段。 先开着多进程也没啥大问题。
@atian25 好吧暂时1个worker过渡
env:
- name: EGG_WORKERS
value: 1
最好 2 个,一个的话,worker 挂了,但 master 还活着(所以 k8s 那边应该不会重启 pod 吧),那在新 worker 启动前,你的这个服务是挂的。
@atian25 ok, 明白.
mark,期待问题分享
蛋疼了吧,哈哈
@zengming00 什么?