{"id":77,"date":"2011-01-19T07:23:32","date_gmt":"2011-01-18T18:23:32","guid":{"rendered":"http:\/\/www.templesoft.co.nz\/blog\/?p=77"},"modified":"2026-04-08T08:19:31","modified_gmt":"2026-04-07T20:19:31","slug":"monitoring-host-availability-using-nagios","status":"publish","type":"post","link":"https:\/\/templesoft.co.nz\/journal\/?p=77","title":{"rendered":"Monitoring host availability using Nagios."},"content":{"rendered":"<h2>I just want it to work!<\/h2>\n<p style=\"text-align: justify;\">It was with a due sense of urgency that I stumbled into my Christmas break this year, like a marathon runner barely making it over the finish line. Oh, I thought I was prepared, I had my remote access after all&#8230; what else would I need? Two days in, the warm blanket of pride was ripped from me as I attempted to check my mail.<\/p>\n<p style=\"text-align: justify;\">To my horror I discovered that I couldn&#8217;t access the mail service?! Surely my rock solid Linux mail server hadn&#8217;t crashed or worst developed a hardware error rendering it useless? I needn&#8217;t have worried, some cursory checks revealed that either the data circuit had gone down or there had been a power cut (we get a lot of them where we are) that the router may not have recovered from. That&#8217;s when I got the call from one of my users who was also having problems&#8230;<\/p>\n<p style=\"text-align: justify;\">After a prolonged period of moans and groans I finally went onsite to check things and discovered that the server was ok, but the ADSL line was dead &#8211; not even a dial tone!! After a chat with the Telco, and various checking of logs on their side, I discovered that the circuit had been down for almost 24 hours!!<\/p>\n<p style=\"text-align: justify;\">This was when the sickening feeling came over me that it had been down for so long and I had no idea. Blissful at home in my smugness that everything was ok and it wasn&#8217;t. That was the moment. The moment I realised I needed the drop on my users. I needed to know there was a problem before they did&#8230; I needed some monitoring software.<\/p>\n<h2 style=\"text-align: justify;\">Nagios? How do you pronounce that?<\/h2>\n<p style=\"text-align: justify;\">I bang on about using Ubuntu for everything, and it&#8217;s not that I have shares in the company or anything, but because it&#8217;s a solid platform that is fairly scalable across a range of hardware configurations. There are so many flavours of Linux out there and I don&#8217;t claim that Ubuntu is the best of the bunch &#8211; but it works really well for everything I need to do, and since money is tight, I can reuse machines too slow to run Windows (plus I feel good that I&#8217;m not creating waste \ud83d\ude42 )<\/p>\n<p style=\"text-align: justify;\"><a href=\"http:\/\/www.nagios.org\/\"><strong>Nagios<\/strong> <\/a>&#8211; an acronym for <strong>N<\/strong>agios <strong>A<\/strong>in&#8217;t <strong>G<\/strong>onna <strong>I<\/strong>nsist <strong>O<\/strong>n <strong>S<\/strong>ainthood after the name <em>NetSaint<\/em> couldn&#8217;t be used &#8211; is a powerful monitoring system that enables you to identify and resolve IT infrastructure problems before they affect critical business processes. It sounded perfect for what I needed&#8230;<\/p>\n<h3 style=\"text-align: justify;\">Package installation<\/h3>\n<p style=\"text-align: justify;\"><em>You know what they say about assumptions&#8230;<\/em><\/p>\n<p style=\"text-align: justify;\">For this exercise the <em>assumption<\/em> is that you&#8217;re installing onto a clean Ubuntu 10.04+ system with no other software installed. Nagios will use Apache and Postfix as a dependency so if they aren&#8217;t already installed, they will be as part of Nagios installation.<\/p>\n<p style=\"text-align: justify;\">I won&#8217;t go into configuring Apache or Postfix in this journal entry I&#8217;ll leave that up to you the reader&#8230;<\/p>\n<p style=\"text-align: justify;\">Install the core packages.<\/p>\n<pre style=\"text-align: justify;\">sudo aptitude update\r\nsudo aptitude install nagios3<\/pre>\n<p style=\"text-align: justify;\">You&#8217;ll notice a number of dependencies will be listed as part of the install. Answer <strong>Y<\/strong> to continue.<\/p>\n<p style=\"text-align: justify;\">The order of the prompts during installation my vary but when prompted by Postfix I selected &#8220;<strong>Satellite System<\/strong>&#8221; and provided the name of my main internal mail server when asked for a &#8220;<strong>relay server<\/strong>&#8220;. This could just as easily be your ISP&#8217;s server but might require some additional tweaking of the <strong>main.cf<\/strong> file.<\/p>\n<p style=\"text-align: justify;\">Next you&#8217;ll be prompted for a password for the &#8220;<strong>nagiosadmin<\/strong>&#8221; account. Select something suitable, retyping it to confirm then allow the installation to continue.<\/p>\n<p style=\"text-align: justify;\">Once installed, if everthing has gone ok you should be able to hit the ground running. To access nagios, point a browser to <strong>http:\/\/<\/strong><em>your.server.name<strong>\/<\/strong><\/em><strong>nagios3<\/strong>. You will then be prompted for the <em>nagiosadmin<\/em> username and password.<\/p>\n<p style=\"text-align: justify;\">In the left-hand column you&#8217;ll see a number of catagories. If you click on &#8220;<strong>Service Detail<\/strong>&#8221; you will see that Nagios has created an entry for <em>this<\/em> server as well as detecting the default gateway for this server.The status for the very first login will probably be &#8220;<em>Pending<\/em>&#8221; but of you check the &#8220;<strong>Status Information<\/strong>&#8221; column you&#8217;ll see when the check is scheduled to take place.<\/p>\n<p style=\"text-align: justify;\">After a matter of minutes, the status for the two hosts as well as the services should turn green and be listed as &#8220;<em>OK<\/em>&#8220;. The default method for checking &#8220;<em>host-alive<\/em>&#8221; is to ping the host, so if you have ICMP turned off Nagios will show the host as &#8220;<em>Critical<\/em>&#8221; until you change the &#8220;<em>host-alive<\/em>&#8221; method to something different.<\/p>\n<h2>Adding additional hosts<\/h2>\n<p style=\"text-align: justify;\">From what I can see Nagios is a fairly modular system in its orientation. Configuration files for each host are created in the <strong>\/etc\/nagios3\/conf.d<\/strong> directory and are loaded when the service starts (or is restarted!)<\/p>\n<p style=\"text-align: justify;\">You can add hosts to groups (<strong>hostgroups_nagios2.cfg<\/strong>), and then define service &#8220;checks&#8221; for members of those groups (<strong>services_nagios2.cfg<\/strong> and <strong>generic-host_nagios2.cfg<\/strong>). This helps to avoid repeating commands across configuration files. This concept also applies to the default host settings as well (<strong>generic-service_nagios2.cfg<\/strong>) and functions the same way an &#8220;include&#8221; command would.<\/p>\n<p style=\"text-align: justify;\">OK, lets start by adding a host to monitor. At this stage I&#8217;ll assume that you&#8217;ve logged into Nagios using the web interface and that the NRPE (<strong>N<\/strong>agios-<strong>R<\/strong>emote-<strong>P<\/strong>lugin-<strong>E<\/strong>xecutor) client hasn&#8217;t been installed on anything yet (except localhost!). It worth noting that (depending on your requirements), the NRPE is not required in a simple setup with fairly passive keep-alive checks.<\/p>\n<p style=\"text-align: justify;\">First of all have a look at <strong>\/etc\/nagios3\/conf.d\/hostgroups_nagios2.cfg<\/strong><\/p>\n<blockquote>\n<p style=\"text-align: justify;\"><span style=\"color: #ff6600;\"><strong>NOTE<\/strong>:<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #ff6600;\">We&#8217;re only going to look at some of the files. There should be no need to alter them at this stage.<\/span><\/p>\n<\/blockquote>\n<pre style=\"text-align: justify;\">sudo vi \/etc\/nagios3\/conf.d\/hostgroups_nagios2.cfg<\/pre>\n<p style=\"text-align: justify;\">You&#8217;ll notice a list of host groups, each with a name, alias and member list. In this first configuration example we&#8217;ll only look at adding something simple that uses PING as it&#8217;s &#8220;host-alive&#8221; and service check. In particular note the entry for the host group titled &#8220;ping-servers&#8221;.<\/p>\n<pre style=\"text-align: justify;\"><span style=\"color: #339966;\">define hostgroup {<\/span>\r\n<span style=\"color: #339966;\">    hostgroup_name   ping-servers<\/span>\r\n<span style=\"color: #339966;\">    alias            Pingable servers<\/span>\r\n<span style=\"color: #339966;\">    members <span style=\"color: #c0c0c0;\">         your default gateway is probably the only thing in here<\/span><\/span>\r\n<span style=\"color: #339966;\">}<\/span><\/pre>\n<p style=\"text-align: justify;\">Next let&#8217;s have a look at<strong> \/etc\/nagios3\/conf.d\/services_nagios2.cfg<\/strong>.<\/p>\n<pre style=\"text-align: justify;\">sudo vi \/etc\/nagios3\/conf.d\/services_nagios2.cfg\r\n\r\n<span style=\"color: #339966;\">define service {<\/span>\r\n<span style=\"color: #339966;\">    hostgroup_name         ping-servers<\/span>\r\n<span style=\"color: #339966;\">    service_description    PING<\/span>\r\n<span style=\"color: #339966;\">    check_command          check_ping!100.0,20%!500.0,60%<\/span>\r\n<span style=\"color: #339966;\">    use                    generic-service<\/span>\r\n<span style=\"color: #339966;\">    notification_interval  0 <span style=\"color: #c0c0c0;\"># set &gt; 0 if you want to be renotified<\/span><\/span>\r\n<span style=\"color: #339966;\">}<\/span><\/pre>\n<p style=\"text-align: justify;\">You can see that any member of &#8220;ping-servers&#8221; in the host-groups file will have the service &#8220;PING&#8221; added to it. Utilising these files will mean that we don&#8217;t need to add specific service entries in the host definition and simplify deployment.<\/p>\n<p style=\"text-align: justify;\">Lastly, have a look at at <strong>\/etc\/nagios3\/conf.d\/generic-host_nagios2.cfg<\/strong>. The important thing to note in this file is the definition for <em>check_command<\/em>. The command <em>check-host-alive<\/em> basically uses ping to check if a host is there and responding. If your hosts have ICMP Ping disabled then this command will result in Nagios reporting that a host is <strong>down<\/strong>. Think about the various ways you can check a host is alive, without using ping for those &#8220;special&#8221; hosts, as you&#8217;ll need to define something here (which is preferred) or in the host definition itself.<\/p>\n<p style=\"text-align: justify;\">For now, our host is &#8220;ping-able&#8221; so it&#8217;s time to create a host definition file. I use the FQDN as the file name, or if monitoring hosts on your local network, I use the internal domain.<\/p>\n<pre style=\"text-align: justify;\">sudo vi \/etc\/nagios3\/conf.d\/hostname1.local.cfg\r\n\r\n<span style=\"color: #339966;\">define host {<\/span>\r\n<span style=\"color: #339966;\">    host_name    <em><strong>hostname1.local<\/strong><\/em><\/span>\r\n<span style=\"color: #339966;\">    alias        <em><strong>Core Switch<\/strong><\/em> <span style=\"color: #c0c0c0;\">meaningful description here<\/span><\/span>\r\n<span style=\"color: #339966;\">    address      <em><strong>192.168.1.250<\/strong><\/em> <span style=\"color: #c0c0c0;\">the IP address of the device\/host<\/span><\/span>\r\n<span style=\"color: #339966;\">    use          <em><strong>generic-host<\/strong><\/em> <span style=\"color: #c0c0c0;\">includes settings in generic-host_nagios2.cfg<\/span><\/span>\r\n<span style=\"color: #339966;\">}<\/span><\/pre>\n<blockquote><p><span style=\"color: #ff6600;\"><strong>Note<\/strong>:<\/span><\/p>\n<p><span style=\"color: #ff6600;\">You can also have a specific icon image appear next to the host&#8217;s entry in the Nagios monitor. To do this add the <em>icon_image<\/em> member and refer to an image in (or relative to, as this is the root of the image&#8217;s location) <strong>\/usr\/share\/nagios3\/htdocs\/images\/logos<\/strong>. There are also more images in <strong>\/usr\/share\/nagios3\/htdocs\/images\/logos\/base<\/strong>.e.g.<\/span><\/p>\n<pre><span style=\"color: #339966;\">icon_image    base\/win40.png<\/span><\/pre>\n<\/blockquote>\n<p>Now let&#8217;s add this host into the &#8220;ping-servers&#8221; group&#8230;<\/p>\n<pre>sudo vi \/etc\/nagios3\/conf.d\/hostgroups_nagios2.cfg\r\n\r\n<span style=\"color: #339966;\">define hostgroup {<\/span>\r\n<span style=\"color: #339966;\">    hostgroup_name    ping-servers<\/span>\r\n<span style=\"color: #339966;\">    alias             Pingable servers<\/span>\r\n<span style=\"color: #339966;\">    members           <em><strong><span style=\"color: #c0c0c0;\">...<\/span>,hostname1.local<\/strong><\/em><\/span>\r\n<span style=\"color: #339966;\">}<\/span><\/pre>\n<blockquote><p><span style=\"color: #ff6600;\"><strong>NOTE<\/strong>:<\/span><\/p>\n<p><span style=\"color: #ff6600;\">For the <strong>members<\/strong> object, you include the name of the host as you defined it by the <strong>host_name<\/strong> object in its definition file. Separate multiple entries with a comma.<\/span><\/p><\/blockquote>\n<p>Save the file and restart Nagios<\/p>\n<pre>sudo \/etc\/init.d\/nagios3 restart<\/pre>\n<p>Once the Nagious service has restarted, go back to the web interface (refresh if required) and you should see an entry for your host under the &#8220;Host Detail&#8221; section. It may not have a status at this stage as it depends on Nagios&#8217; polling cycle, but it normally doesn&#8217;t take any longer than 5mins for the status to update. The &#8220;Status Information&#8221; column should give some hints on when the next check is scheduled.<\/p>\n<p>If you now click on &#8220;Service Detail&#8221; you should see your host with the name &#8220;PING&#8221; in the service column.<\/p>\n<p>If you click on &#8220;View Config&#8221;, and then choose &#8220;Commands&#8221; as the object type, you can get an idea of the vast array of checks that can be done with Nagios. Remember: some of them are great passive checks, but there are a few that require the NRPE in order to extract machine specific information (disk usage, cpu usage, processes, etc&#8230;).<\/p>\n<blockquote><p><span style=\"color: #ff6600;\"><strong>NOTE<\/strong>:<\/span><\/p>\n<p><span style=\"color: #ff6600;\">Be aware that checking ports directly can cause problems. Case in point (and lesson by experience) I used the <strong>check_tcp<\/strong> command to make sure my VNC port was open &#8211; the theory being if the port was open, then the &#8220;host&#8221; must be alive. Problem was VNC interpreted that as a scanning attack (duh) and promptly shut the service down with events stating an invalid login.<\/span><\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>I just want it to work! It was with a due sense of urgency that I stumbled into my Christmas break this year, like a marathon runner barely making it over the finish line. Oh, I thought I was prepared, I had my remote access after all&#8230; what else would I need? Two days in,&#8230;  <a class=\"excerpt-read-more\" href=\"https:\/\/templesoft.co.nz\/journal\/?p=77\" title=\"Read Monitoring host availability using Nagios.\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[60,61,59,19],"class_list":["post-77","post","type-post","status-publish","format-standard","hentry","category-technical-resource","tag-availability","tag-monitoring","tag-nagios","tag-ubuntu"],"_links":{"self":[{"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=\/wp\/v2\/posts\/77","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=77"}],"version-history":[{"count":4,"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=\/wp\/v2\/posts\/77\/revisions"}],"predecessor-version":[{"id":173,"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=\/wp\/v2\/posts\/77\/revisions\/173"}],"wp:attachment":[{"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=77"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=77"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/templesoft.co.nz\/journal\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=77"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}