在ubuntu 16.04下运行hadoop时注销

我在ubuntu 16.04下在伪群集和群集模式下运行hadoop作业时遇到了一些麻烦.

在运行vanila hadoop / hdfs安装时 – 我的hadoop用户获得了
注销,此用户运行的所有进程都将关闭.
我没有在日志中看到任何指示(/ var / log / systemd,journalctl或
dmesg)解释了用户登出的原因.

似乎我不是唯一有这个或类似问题的人:

https://stackoverflow.com/questions/38288162/in-ubuntu-16-04-running-hadoop-jar-laptop-gets-rebooted

注意:创建特殊的hadoop用户实际上并没有解决我的问题 – 但限制了注销到专用用户.

https://askubuntu.com/questions/784591/ubuntu-16-04-kills-session-when-resource-usage-is-extremely-high

是否可能围绕UserGroupInformation类出现问题
(在某些情况下会导致注销),在ubuntu 16.04中systemd中的某些更改可能会导致此行为吗?

我在注销之前得到的hadoop日志的最后几行:

...
16/07/13 16:45:37 DEBUG ipc.ProtobufRpcEngine: Call: getJobReport took 4ms
16/07/13 16:45:37 DEBUG security.UserGroupInformation: PrivilegedAction
as:hduser (auth:SIMPLE)
from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
16/07/13 16:45:37 DEBUG ipc.Client: IPC Client (1360814716) connection to
laptop/127.0.1.1:37339 from hduser sending #375
16/07/13 16:45:37 DEBUG ipc.Client: IPC Client (1360814716) connection to
laptop/127.0.1.1:37339 from hduser got value #375
16/07/13 16:45:37 DEBUG ipc.ProtobufRpcEngine: Call: getJobReport took 2ms
Terminated
hduser@laptop:~$16/07/13 16:45:37 DEBUG ipc.Client: stopping client from
cache: org.apache.hadoop.ipc.Client@4e7ab839
exit

journalctl:

Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 7.
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 6.
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 5.
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 8.

系统日志:

Jul 12 16:06:43 laptop systemd[4172]: Stopped target Default.
Jul 12 16:06:43 laptop systemd[4172]: Reached target Shutdown.
Jul 12 16:06:44 laptop systemd[4172]: Starting Exit the Session...
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Basic System.
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Sockets.
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Paths.
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Timers.
Jul 12 16:06:44 laptop systemd[4172]: Received SIGRTMIN+24 from PID
10101 (kill).
Jul 12 16:06:44 laptop systemd[1]: Stopped User Manager for UID 1001.
Jul 12 16:06:44 laptop systemd[1]: Removed slice User Slice of hduser.

我也有问题.花了我一些时间,但我在这里找到了解决方案:
https://unix.stackexchange.com/questions/293069/all-services-of-a-user-are-killed-when-running-multiple-services-under-this-user

基本上,一些hadoop进程就停止了,因为为什么不呢.但是当看到服务进程死亡时,systemd似乎会杀死所有用户的进程.

修复是添加

[login]
KillUserProcesses=no

到/etc/systemd/logind.conf并重启.

我有多个ubuntu的版本来调试问题,修复似乎只适用于ubuntu 16.04.

dawei

【声明】:淮南站长网内容转载自互联网,其相关言论仅代表作者个人观点绝非权威,不代表本站立场。如您发现内容存在版权问题,请提交相关链接至邮箱:bqsm@foxmail.com,我们将及时予以处理。