Monday 19 September 2011

Nagios Plugin 'check_procs' incorrectly finds 0 processes

When checking for running processes on remote Linux systems via NRPE, the Nagios plugin check_procs –C <process commandname> occasionally responds with unexpected results.

Example on a Zenworks 7 server

If we look for the tftpd daemon using ps:
ZEN03:/usr/local/nagios/libexec # ps -ef|grep tftpd
root     4103      1  0 Sep14 ?        00:01:31 /opt/novell/bin/novell-tftpd
root     20047 17950  0 13:06 pts/0    00:00:00 grep tftpd

Then we look for it with check_procs:
ZEN03:/usr/local/nagios/libexec # ./check_procs -C novell-tftpd
PROCS OK: 1 process with command name 'novell-tftpd'

The check_procs plugin correctly reports that one process has been found with this name

However, if we look for the proxy dhcp daemon using ps:
ZEN03:/usr/local/nagios/libexec # ps -ef |grep proxy
root  21171     1  0 Sep18 ?        00:00:00 /opt/novell/bin/novell-proxydhcpd
root  20137 17950  0 13:07 pts/0    00:00:00 grep proxy

And then with check_procs:
ZEN03:/usr/local/nagios/libexec # ./check_procs -C novell-proxydhcpd
PROCS OK: 0 processes with command name 'novell-proxydhcpd'

In this case,  the check_procs plugin has reported that 0 processes have been found, even though we can clearly see that this is not the case.

The trick in these situations is to ask check_procs for more information using the –vv switch:
ZEN03:/usr/local/nagios/libexec # ./check_procs -vv -C novell-proxydhcpd
CMD: /bin/ps axwo 'stat uid pid ppid vsz rss pcpu comm args'
PROCS OK: 0 processes with command name 'novell-proxydhcpd'

Here check_procs has told us what it is passing to ps to find out the information it is reporting back to us.

So let us use that parameter list for our own check:
ZEN03:/usr/local/nagios/libexec # /bin/ps axwo 'stat uid pid ppid vsz rss pcpu comm args'|grep proxy
S       0 21171     1   1412   396  0.0 novell-proxydhc /opt/novell/bin/novell-proxydhcpd
S+      0 20263 17950   1712   672  0.0 grep            grep proxy

And there it is – on this system (SLES9), ps is only reporting back the first 15 characters, so no match is being found.

So here, what we have to do is to ask check_procs to only look for the first 15 characters
ZEN03:/usr/local/nagios/libexec # ./check_procs -C novell-proxydhc
PROCS OK: 1 process with command name 'novell-proxydhc'

We can now correctly check for our proxy dhcp daemon in Nagios.

No comments:

Post a Comment