我在ubuntu 16.04下在伪群集和群集模式下运行hadoop作业时遇到了一些麻烦.
在运行vanila hadoop / hdfs安装时 – 我的hadoop用户获得了
注销,此用户运行的所有进程都将关闭.
我没有在日志中看到任何指示(/ var / log / systemd,journalctl或
dmesg)解释了用户登出的原因.
似乎我不是唯一有这个或类似问题的人:
https://stackoverflow.com/questions/38288162/in-ubuntu-16-04-running-hadoop-jar-laptop-gets-rebooted
注意:创建特殊的hadoop用户实际上并没有解决我的问题 – 但限制了注销到专用用户.
https://askubuntu.com/questions/784591/ubuntu-16-04-kills-session-when-resource-usage-is-extremely-high
是否可能围绕UserGroupInformation类出现问题
(在某些情况下会导致注销),在ubuntu 16.04中systemd中的某些更改可能会导致此行为吗?
我在注销之前得到的hadoop日志的最后几行:
... 16/07/13 16:45:37 DEBUG ipc.ProtobufRpcEngine: Call: getJobReport took 4ms 16/07/13 16:45:37 DEBUG security.UserGroupInformation: PrivilegedAction as:hduser (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320) 16/07/13 16:45:37 DEBUG ipc.Client: IPC Client (1360814716) connection to laptop/127.0.1.1:37339 from hduser sending #375 16/07/13 16:45:37 DEBUG ipc.Client: IPC Client (1360814716) connection to laptop/127.0.1.1:37339 from hduser got value #375 16/07/13 16:45:37 DEBUG ipc.ProtobufRpcEngine: Call: getJobReport took 2ms Terminated hduser@laptop:~$16/07/13 16:45:37 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@4e7ab839 exit
journalctl:
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 7. Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 6. Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 5. Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 8.
系统日志:
Jul 12 16:06:43 laptop systemd[4172]: Stopped target Default. Jul 12 16:06:43 laptop systemd[4172]: Reached target Shutdown. Jul 12 16:06:44 laptop systemd[4172]: Starting Exit the Session... Jul 12 16:06:44 laptop systemd[4172]: Stopped target Basic System. Jul 12 16:06:44 laptop systemd[4172]: Stopped target Sockets. Jul 12 16:06:44 laptop systemd[4172]: Stopped target Paths. Jul 12 16:06:44 laptop systemd[4172]: Stopped target Timers. Jul 12 16:06:44 laptop systemd[4172]: Received SIGRTMIN+24 from PID 10101 (kill). Jul 12 16:06:44 laptop systemd[1]: Stopped User Manager for UID 1001. Jul 12 16:06:44 laptop systemd[1]: Removed slice User Slice of hduser.
我也有问题.花了我一些时间,但我在这里找到了解决方案:
https://unix.stackexchange.com/questions/293069/all-services-of-a-user-are-killed-when-running-multiple-services-under-this-user
基本上,一些hadoop进程就停止了,因为为什么不呢.但是当看到服务进程死亡时,systemd似乎会杀死所有用户的进程.
修复是添加
[login] KillUserProcesses=no
到/etc/systemd/logind.conf并重启.
我有多个ubuntu的版本来调试问题,修复似乎只适用于ubuntu 16.04.