Just moozing

Before you can check your notes, you must make them…

Nagios and debugging notifications

leave a comment »

I have been working on my nagios ansible role for some time now, and notifications have been an issue all along, so I decided to put some notes here.

nagios_logoSo setting up Nagios is fairly straight forward, and for small’ish setups, it is easy to maintain a good overview of things. The most important part for me is that the system automatically tells me when stuff stops working – this is called Notifications in Nagios.

Notifications can be simple or complex, but to Nagios is it just to run a shell script. This gives you all the power, but with great power comes great responsibility you can mess stuff up easily.

 

Are notifications enabled?

The first one to check is the obvious notification disabled icon next to the check in the tables on the web interface. It is also shown in the detailed view of the check.

Nagios puts notifications through a number of filters before they are actually triggered. You can read about them here.

 

Program wide filter

You can disable them completely in nagios.cfg using enable_notification.


$ cat etc/nagios.cfg | grep notification | grep -v ^#
 log_notifications=1
 notification_timeout=30
 enable_notifications=1

 

This means that I have enabled filters, notification scripts running for more than 30 seconds will be killed and we are actually logging them.

 

Host and service

Next check relates to host and service definitions. When defining the hosts and services, there are options for notifications. The default installation puts this in template.cfg with names like generic_host or generic_service.

Checks (in order)

  1. scheduled downtime
  2. flapping
  3. host and service definitions
  4. notification periods
  5. notification interval

 

Contacts

Just like hen defining a contact, there are values for notification periods, options and services. Again, this is default in templates.cfg.
There is also some exceptions related to timing and previous notifications.

 

Send a custom notification

I find that sending custom notifications are a good way of testingĀ  my custom notification scripts.

Quick howto:

  1. Go to page for e.g. host
  2. Click “send custom notification”
  3. Don’t “force”, nor “broadcast”
  4. Write some recognizable text
  5. Receive or not
  6. If not, do it again, this time “force”

Force will bypass the notification filters, and that gives you a check if the actual notification scripts are working.

 

Checking the logs

Nagios has several logs located in var. They contain a lot, mostly readable stuff.

When doing a custom notification, it will show up as an “EXTERNAL COMMAND”. If it worked, you should see “HOST NOTIFICATION:” with the details of who is contacted.

This should match the notification settings of host_groups, service_groups, contact_groups and all the non-groups.

 

Enabling debugging

In nagios.cfg there are some debug options.

  • debug_level, which when set to “-1” logs everything
  • debug_verbosity, which is about how detail is dumped in the log.

Update: Setting debug_level to 32, will only show the notificaitons. Very useful.

It is default logged to var/nagios.debug. This file contains a lot, especially with verbosity 2, it is too much and you need to grep your way through it.

Quick tricks:

  • grep using the host name
  • grep using “-i notification”
  • grep using “final output”

It shows the final output which Nagios is actually executing. This is copy’n’pastable to the command line, and is very useful for debugging – it exposes errors that you otherwise would miss (like unmatched quotation signs).

 

An example

My notification logic worked when forced, but not normally. Checking the logs gave me

This host shouldn’t have notifications sent out at this time.
Notification viability test failed. No notification will be sent out.

This gave me something very specific to search and I found the relevant section in the Nagios source code here.

This specific message means that the check for notification interval failed, and it will not send messages right now. It turned out that I had been working with the same check too much and that was the only one not working.

Advertisements

Written by moozing

October 15, 2017 at 12:00

Posted in Tech

Tagged with , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: