> -----Original Message-----
> From: nagios-devel-bounces@... [mailto:nagios-devel-
> bounces@...] On Behalf Of Thomas Guyot-Sionnest
> Sent: Tuesday, October 31, 2006 10:49 AM
> To: nagios-devel@...
> Subject: [Nagios-devel] Feature request for host scheduled downtimes
>=20
> When scheduling downtime for a host, it should be possible to schedule
the
> downtime for all services on the host as well.
>=20
> This is because it often happens that during a host maintenance the
host
> can
> still ping but most / all services are down, so to avoid waiking up
> everyone
> we must disable all notifications for the host. It should be possible
to
> use
> downtime scheduling for these situations.
Nagios does this already and it's documented at
http://nagios.sourceforge.net/docs/2_0/notifications.html (Service and
Host Filters) --
"The first filter for host or service notifications is a check to see if
the host or service is in a period of scheduled downtime. It it is in a
scheduled downtime, no one gets notified. If it isn't in a period of
downtime, it gets passed on to the next filter. As a side note,
notifications for services are suppressed if the host they're associated
with is in a period of scheduled downtime."
Are you really seeing notifications for services on hosts in scheduled
downtime or just thinking that you will?
--
Marc

Currently when I schedule downtime for a host I notice that my folks do
not get notification of service outages as well. Actually, I use a
custom PHP and shell script to schedule the outages and it is only
sending SCHEDULE_HOST_DOWNTIME to the nagios.cmd pipe. We use this to
schedule downtime when doing anything with a server so that any service
or host outages do not send out notifications. Is the behavior you are
seeing different?
If you still would like to schedule service outages I can provide you
the set of PHP scripts I used to do it. In fact, I used to also
schedule svc downtime with the same script via
SCHEDULE_HOST_SVC_DOWNTIME, but stopped that because early cancellation
of the downtime was a pain to have to click to delete each of those vs.
just clicking once for the removing the scheduled host downtime.
-Anthony
On 10/31/2006 8:48 AM, Thomas Guyot-Sionnest had said:
> When scheduling downtime for a host, it should be possible to schedule the
> downtime for all services on the host as well.
>
> This is because it often happens that during a host maintenance the host can
> still ping but most / all services are down, so to avoid waiking up everyone
> we must disable all notifications for the host. It should be possible to use
> downtime scheduling for these situations.
>
> If nobody work on this I mighh try to implement it, but I have only basic C
> knowledge. Any chances it gets in the 2.x series if my code looks good and it
> doesn't break current command syntax?
>
> Thanks,
>
> Thomas
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> ------------------------------------------------------------------------
>
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel@...
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>

When scheduling downtime for a host, it should be possible to schedule the
downtime for all services on the host as well.
This is because it often happens that during a host maintenance the host can
still ping but most / all services are down, so to avoid waiking up everyone
we must disable all notifications for the host. It should be possible to use
downtime scheduling for these situations.
If nobody work on this I mighh try to implement it, but I have only basic C
knowledge. Any chances it gets in the 2.x series if my code looks good and it
doesn't break current command syntax?
Thanks,
Thomas

Joerg Linge schrieb:
> Am Montag, 30. Oktober 2006 18:10 schrieb Hendrik Baecker:
>
>> Hi,
>>
>> I have applied the last patch to current CVS head... it works so far.
>>
>> Hope this doesn't break anything...
>>
>
> Hi Hendrik,
> have you compiled with ePN support ?
>
> The latest CVS Code runs with production config (2000 Services) for 7 houres without problems.
> The new Hostcheck logic works good. The check latency is minimal.
>
> Jörg
>
>
Didn't testing so deep but compiled with ePN. Compile looks good so far.
Off-Topic:
I am going to merge my splitted 6000 config to one single config and
we'll test the new logic.
Think in that moment Ethan will complete the new Docs we all will feel
the new force ;)
Hendrik

Am Montag, 30. Oktober 2006 18:10 schrieb Hendrik Baecker:
> Hi,
>=20
> I have applied the last patch to current CVS head... it works so far.
>=20
> Hope this doesn't break anything...
Hi Hendrik,
have you compiled with ePN support ?
The latest CVS Code runs with production config (2000 Services) for 7 houre=
s without problems.
The new Hostcheck logic works good. The check latency is minimal.
J=F6rg=20

That's wonderful, but what about capturing stderr for diagnostics?
This is where a good portion of the problems appear when something goes
wrong and needs to be debugged?
I think it would be a grand idea make the parent process prove stderr
diagnostics if stdout has nothing to say.
As an aside, is there a way I can get involved in the 3.x series
discussions/development? If nothing else, I'd sure like to see what's
coming around the corner and perhaps help out...
Bob
> Thomas Sluyter wrote:
>> On 25 Oct, 2006, at 0:08, bobi@... wrote:
>>
>>> But being a lazy programmer, I am thinking, why not just have
>>> Nagios be a
>>> little more forgiving and inquisitive and keep searching stdout
>>> until it
>>> finds the first non-empty line? Is that so bad? Or is it a feature?
>>
>> You'll like Nagios 3 when it comes out. It'll allow you to read
>> multiple lines of stdout. So be patient, little grasshopper.
>>
>> Cheers! o/
>>
>>
>> Thomas
>>
>
> As Thomas mentioned, Nagios 3 will support multiple lines of output from
> plugins, so this patch would break that future feature. :-) In Nagios 3
> any output after the first line will get thrown into either the
> $LONGSERVICEOUTPUT$ or $LONGHOSTOUTPUT$ macros, depending on what type
> of check was performed. There is also a 4K or 8K limit on the output
> size to prevent runaway plugins from returning too much data.
>
>
> Ethan Galstad,
> Nagios Developer
> ---
> Email: nagios@...
> Website: http://www.nagios.org
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel@...
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>

daniel.tuecks@... wrote:
> Hello,
>
> I am trying to develop a patch that helps me to hide certain hostgroups from the
> status cgi. I thought of something like adding a "shouldnotseeme-" prefix to the
> hostgroup alias:
>
> define hostgroup {
> hostgroup_name shown
> alias This Host Group will be displayed
> members host1,host2,host3
> }
>
> define hostgroup {
> hostgroup_name not-shown
> alias shouldnotseeme-This Host Group will not be displayed
> members host3,host4,host5
> }
>
> First of all I must admit I am no C programmer.
> I had a look at the status.c source file. Around line 3090 I found a procedure
> show_hostgroup_overview. Now I'd like to simply check there if hstgrp->alias
> contains the string "shouldnotseeme-"
> Unfortunally I do not know how to do this.
> With PHP I think it would look like this:
>
> if (strstr($hstgrp->alias, "shouldnotseeme-") != FALSE)
> {
> // don't show this one
> ..
> }
>
> But I have no idea how to write this in C :( Can someone point me in the right
> direction. Do you think this will work at all?
>
Perhaps. Fortunately for you, PHP borrows much of its style from C, so
the above would work in C as well, although it has the side-effect (both
in PHP and C) the "shouldnotseeme-" doesn't have to be a prefix, but can
instead appear anywhere in the string.
To find it only if it is the prefix (and also make the code run a little
bit faster), you should instead use
strncmp(hostgroup->alias, "shouldnotseeme-", 15);
Some may argue that memcmp() would be faster in C. On most
architectures, this is true, but it doesn't guarantee that it will stop
at a NUL char so you could possibly run into the program crashing for you.
--
Andreas Ericsson andreas.ericsson@...
OP5 AB http://www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231

<div><font size=3D"2">Hello,<br><br>I am trying to develop a patch that hel=
ps me to hide certain hostgroups from the status cgi. I thought of somethin=
g like adding a "shouldnotseeme-" prefix to the hostgroup alias:<br><br>def=
ine hostgroup {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; hostgroup=5Fn=
ame&nbsp; shown<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; alias&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp; This Host Group will be displayed<br>&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp; members&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp; host1,host2,host3<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp; }<br><br></font><div><font size=3D"2">define hostgroup {<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; hostgroup=5Fname&nbsp; not-shown=
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; alias&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
shouldnotseeme-This Host Group will not be displayed<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; members&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; host3,host4,host5<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>
</font><br>First of all I must admit I am no C programmer. <br>I had a look=
at the status.c source file. Around line 3090 I found a procedure show=5Fh=
ostgroup=5Foverview. Now I'd like to simply check there if hstgrp-&gt;alias=
contains the string "shouldnotseeme-"<br>Unfortunally I do not know how to=
do this. <br>With PHP I think it would look like this:<br><br>&nbsp;&nbsp;=
&nbsp; if (strstr($hstgrp-&gt;alias, "shouldnotseeme-") !=3D FALSE)<br>&nbs=
p;&nbsp;&nbsp; {<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; // don't show thi=
s one<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ..<br>&nbsp;&nbsp;&nbsp; }<b=
r><br>But I have no idea how to write this in C :( Can someone point me in =
the right direction. Do you think this will work at all? <br><br><br>Gruess=
e&nbsp;-&nbsp;Best&nbsp;regards<br><br>Daniel&nbsp;Tuecks<br>System&nbsp;En=
gineer<br>Consulting&nbsp;Services<br>Computacenter&nbsp;AG&nbsp;&amp;&nbsp=
;Co.&nbsp;oHG<br>Kokkolastrasse&nbsp;1,&nbsp;40882&nbsp;Ratingen,&nbsp;Germ=
any<br>Phone:&nbsp;+49&nbsp;(0)2102&nbsp;169-0,&nbsp;Fax:&nbsp;+49&nbsp;(0)=
2102&nbsp;169-1144,&nbsp;Mobil:&nbsp;+49&nbsp;(0)172&nbsp;984&nbsp;98&nbsp;=
77<br>Email:&nbsp;<a href=3D"mailto:daniel.tuecks@...">daniel=
.tuecks@...</a><br>Visit&nbsp;us&nbsp;on&nbsp;the&nbsp;Intern=
et:&nbsp;http://<a href=3D"https://www.mycomputacenter.de/get/uri/http://.f=
=5FwxyBgtsw22knq07t7Pvx">www.computacenter.de</a><br>Visit&nbsp;our&nbsp;On=
line-Shop:&nbsp;http://<a href=3D"https://www.mycomputacenter.de/get/uri/ht=tp://.f=5FwxyBgtsw22knq07t7Pvx/connect">www.computacenter.de/connect</a><br=
><br>This&nbsp;email&nbsp;is&nbsp;confidential.&nbsp;If&nbsp;you&nbsp;are&n=
bsp;not&nbsp;the&nbsp;intended&nbsp;recipient,<br>you&nbsp;must&nbsp;not&nb=
sp;disclose&nbsp;or&nbsp;use&nbsp;the&nbsp;information&nbsp;contained&nbsp;=
in&nbsp;it.<br>If&nbsp;you&nbsp;have&nbsp;received&nbsp;this&nbsp;mail&nbsp=
;in&nbsp;error,&nbsp;please&nbsp;tell&nbsp;us<br>immediately&nbsp;by&nbsp;r=
eturn&nbsp;email&nbsp;and&nbsp;delete&nbsp;the&nbsp;document.<br></div></di=
v>=

Thomas Sluyter wrote:
> On 25 Oct, 2006, at 0:08, bobi@... wrote:
>
>> But being a lazy programmer, I am thinking, why not just have
>> Nagios be a
>> little more forgiving and inquisitive and keep searching stdout
>> until it
>> finds the first non-empty line? Is that so bad? Or is it a feature?
>
> You'll like Nagios 3 when it comes out. It'll allow you to read
> multiple lines of stdout. So be patient, little grasshopper.
>
> Cheers! o/
>
>
> Thomas
>
As Thomas mentioned, Nagios 3 will support multiple lines of output from
plugins, so this patch would break that future feature. :-) In Nagios 3
any output after the first line will get thrown into either the
$LONGSERVICEOUTPUT$ or $LONGHOSTOUTPUT$ macros, depending on what type
of check was performed. There is also a 4K or 8K limit on the output
size to prevent runaway plugins from returning too much data.
Ethan Galstad,
Nagios Developer
---
Email: nagios@...
Website: http://www.nagios.org

Wow. We got a lot of responses.
We are going to take a few days to write up some documentation and we will
post that to the list.
Just one thing we noticed in the e-mails is that there is some confusion
between this NDO. DNX does not replace NDO. NDO is still required for
multiple data centers.
Bob
> This is a very nice patch indeed. It doesn't break anything that's
> working now, but lets module-authors get more power over how nagios
> executes checks. It's also relatively small and non-intrusive and, as a
> side-effect, it makes it possible to write plugins as modules. Overall,
> I like it.
>
> Some questions though, inlined below. Oh, and I would very much like to
> see the module. :)
>
>
> bobi@... wrote:
>> Attached is a patch-set I would like some feedback on.
>>
>> The purpose of this patch is to allow Nagios the ability to delegate the
>> execution of service checks to a NEB module.
>>
>> Why would we want to do this? I'm glad you asked...
>>
>> The point is to allow Nagios to scale efficiently in large-scale
>> environments by delegating service checks to multi-node "check"
>> clusters.
>> That is, it facilitates the creation of a Nagios Service Check Cluster
>> (or
>> multiple independent clusters,) that can be deployed in either one
>> location or multiple locations.
>>
>> The benefits are:
>>
>> 1. It de-couples Service Check execution from Scheduling on the same
>> box.
>> Sure, you can do this by setting up multiple Nagios instances that
>> report
>> their results passivley back up to the "master" Nagios box, but that
>> requires manually splitting up you configuration among multiple Nagios
>> instances, setting up all of the passive result reporting, etc.
>>
>> In this scenario, you can keep your centrally-located master
>> configuration
>> file and have the service check distributed to light-weight,
>> geographically-dispersed service check clusters.
>>
>
> How does the module determine which node checks what?
> How is configuration distributed?
>
>> 2. Scalability. You can support more simultaneous service checks by
>> adding more light-weight service check nodes incrementally.
>>
>
> Do you have to restart the "master" nagios in order for this to work, or
> will they be picked up as one goes along?
> If "picked up as one goes along", how does handshake and authentication
> work?
>
>> You can start with zero external nodes (i.e., all checks still executed
>> by
>> Nagios internally.) Then add one node as you service check count
>> increases. Then gradually (or quickly,) increase the node count,
>> locally
>> or remotely, as your service check count grows, and the system will
>> scale
>> appropriately.
>>
>> Anyway, it's not the ultimate, end-all, be-all, but we have found it
>> helps
>> us scale and manage Nagios efficiently in our large-scale,
>> multi-datacenter environment. The hope is that this will be considered
>> as
>> a potential part of the new Nagios architecture some day.
>>
>> For those who want to know how Nagios actually delegates service check
>> execution to an external cluster via a NEB module, here are the
>> high-level
>> details:
>>
>> We have written a multi-threaded NEB module that registers a
>> NEBCALLBACK_SERVICE_CHECK_DATA callback and watches for the
>> NEBTYPE_SERVICECHECK_INITIATE event.
>>
>> It then takes each service check and distributes it across the network
>> to
>> multiple "worker" nodes in a cluster (via XML-RPC). It also takes care
>> of
>> processing the check results, posting them to the internal Nagios result
>> queue, plugin timeout conditions, etc.
>>
>
> Does this go through the FIFO pipe? If so, I'm afraid it doesn't solve
> the biggest issue in scaling Nagios to large networks.
>
> --
> Andreas Ericsson andreas.ericsson@...
> OP5 AB http://www.op5.se
> Tel: +46 8-230225 Fax: +46 8-230231
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel@...
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>

> -----Original Message-----
> From: nagios-devel-bounces@... [mailto:nagios-devel-
> bounces@...] On Behalf Of Rob Brown
> Sent: Wednesday, October 25, 2006 16:05
> To: nagios-devel@...
> Subject: [Nagios-devel] macro referencing in host/serviceextinfo
> notesdirective
>
> I have been experimenting with using the host/serviceextinfo
> definitions to add some external links to information about
> hosts/services. One thing I noticed while messing about is that the
> "notes_url" and "action_url" directives seem to be able to do macro
> replacement, while the "notes" directive does not. It would be quite
> helpful to be able to use certain macros (like $HOSTNAME$,
> $HOSTALIAS$, etc) in the notes. For example:
> define hostextinfo{
> hostgroup_name hostgroup1
> notes <a
> href="http://webserver.localhost.localdomain/hostinfo.pl?host=$HOSTNAME$"&gt;
> Click
> Here for some external information about $HOSTNAME$</a><br><a
> href="http://webserver2.localhost.localdomain/hostinfo2.pl?host=$HOSTNAME$
> ">Click
> Here for some other information about $HOSTNAME$</a>
> }
> define serviceextinfo{
> hostgroup_name hostgroup1
> service_description someservice
> notes <a
> href="http://webserver.localhost.localdomain/serviceinfo.pl?host=$HOSTNAME
> $&service=$SERVICEDESC$">Click
> Here for some external information about $SERVICEDESC$ on
> $HOSTNAME$</a><br><a
> href="http://webserver2.localhost.localdomain/serviceinfo.pl?host=$HOSTNAM
> E$&service=$SERVICEDESC$">Click
> Here for some other information about $SERVICEDESC$ on $HOSTNAME$</a>
> }
Yep, would be very useful. I personally point action_url to the commands
screen so I can send it in emails (plain-text). With the feature you suggest
I could send HTML mails (I don't especially like this but windows guys love
them) and include a notes macro just like the one you described.
Thomas

I have been experimenting with using the host/serviceextinfo
definitions to add some external links to information about
hosts/services. One thing I noticed while messing about is that the
"notes_url" and "action_url" directives seem to be able to do macro
replacement, while the "notes" directive does not. It would be quite
helpful to be able to use certain macros (like $HOSTNAME$,
$HOSTALIAS$, etc) in the notes. For example:
define hostextinfo{
hostgroup_name hostgroup1
notes <a href="http://webserver.localhost.localdomain/hostinfo.pl?host=$HOSTNAME$">Click
Here for some external information about $HOSTNAME$</a><br><a
href="http://webserver2.localhost.localdomain/hostinfo2.pl?host=$HOSTNAME$">Click
Here for some other information about $HOSTNAME$</a>
}
define serviceextinfo{
hostgroup_name hostgroup1
service_description someservice
notes <a href="http://webserver.localhost.localdomain/serviceinfo.pl?host=$HOSTNAME$&service=$SERVICEDESC$">Click
Here for some external information about $SERVICEDESC$ on
$HOSTNAME$</a><br><a
href="http://webserver2.localhost.localdomain/serviceinfo.pl?host=$HOSTNAME$&service=$SERVICEDESC$">Click
Here for some other information about $SERVICEDESC$ on $HOSTNAME$</a>
}
You might say: "Well why not just use the "notes_url" or "action_url"
directives?"
My answer is a bit petty: I just don't like the way they are
displayed: way off to the side with the generic "extra host notes"
text. I would rather have more control by being able to put in the
html myself and have multiple links as in the examples above.
It would be nice just to have the option of using macros here if you choose.
I'm not sure how this would affect the $HOSTNOTES$ and $SERVICENOTES$
macros, as obviously any macros contained within would need to be
expanded first. I'll leave that up to one of you sharp developers to
figure out.
Perhaps it would make more sense to re-think the way the "notes_url"
or "action_url" directives are defined and allow a little more
flexibility there. This is all in an effort to make nagios a central
admin tool and pull in (or at least link to) info from other souces.

> -----Original Message-----
> From: nagios-devel-bounces@...
> [mailto:nagios-devel-bounces@...] On Behalf
> Of Thibault Genessay
> Sent: October 25, 2006 4:35
> Cc: nagios-devel@...
> Subject: Re: [Nagios-devel] Patch for Plugin "No Output"
>
> Hi
>
> bobi@... wrote:
> > Hi All,
> >
> > Like everyone else, we've had our fair share of mysterious
> plugin "(No
> > output!)" messages.
> >
> > Sometimes this error is due to the plugin failing and only writing
> > diagnostics to stderr (and nothing is written to stdout.)
> >
> > Also, as everyone knows, Nagios only reads the first line of
> > newline-terminated output from a plugin and throws the rest
> away. But,
> > what if the first line is just a new-line and the good stuff is on a
> > subsequent line?
>
> Then it's not good stuff; good stuff is on the first line of
> stdout *only*.
>
> > Yes, I know - fix your plugin to output only to the first line.
> >
> Exactly :)
>
>
> > But being a lazy programmer, I am thinking, why not just
> have Nagios be a
> > little more forgiving and inquisitive and keep searching
> stdout until it
> > finds the first non-empty line? Is that so bad? Or is it
> a feature?
> >
>
> Small scripts, writen once, executed once, can be lazily
> coded. However,
> applications running for a long time (e.g. daemons) should be very
> carefully coded.
>
> > Well, you can be the judge.
> >
> > Anyway, I put together this patch for checks.c - it
> modifies the plugin
> > output handling logic in the following manner:
> >
> > 1. As usual, it reads plugin output from stdout.
> >
> > However, if the first line is empty, it keeps reading until
> it gets a
> > non-empty line or EOF.
> >
> > If it gets a non-empty line, then that first non-empty line
> becomes the
> > plugin output.
> >
> > 2. However, if it really gets no output from stdout (i.e.,
> nothing or all
> > empty lines,) then it reads the plugin's stderr and returns
> the first
> > non-empty line it finds.
> >
> > 3. If it gets nothing from stderr as well, then it finally returns
> > everyone's favorite diagnostic: "(No output!)"
> >
>
> Taking your approach, this is certainly a good algorithm.
>
> > You know, we should really change that diagnostic to: "(No
> output! Have a
> > nice day!)"
> >
>
> This would make our favorite admins' days easier sometimes :)
>
> >
> > Anyway, I'd be very interested in any alternate suggestions, good
> > comments, insightful observations or even witty reparte'.
> >
> > BTW, in order to provide the ability to read both stdout
> and stderr from a
> > plugin sub-process, I've written my own version of the
> standard C popen(3)
> > function called pfopen(). I did this because this is the
> problem with the
> > standard popen(3) function - it only return stdout to the
> parent process,
> > which may only give you half the story since it ignores all
> potentially
> > usefull diagnostic info from the stderr of the child process.
> >
> That's an interesting piece of code. I'll keep it somewhere,
> just in case
>
I fully agree with your position: Nagios should respect strict standards and
it's plug-ins that should be fixed.
I'm particularly worried in the case a nasty bug or unexpected exception
happen during the execution of a plug-in, and it end up printing empty lines
in an infinite loop (you talk about lazy programming, well lazy programming
CAN do that). If you must do this work in Nagios, then at least have a
maximum empty line number.
I'd personally prefer the solution of a wrapper plug-in that can takes some
options followed by the full check command line, run the check and then
return to Nagios one line of stdout and a valid return code. This as the
advantage of not modifying Nagios while having a great deal of flexibility.
For example you could have a parameter to say take only line x of STDOUT,
look only to STDERR of return some text in place of a No Output! Message.
Another option for such wrapper plug-in could be to limit the number of
characters, and could be very useful for running pager alerts when you don't
want big message to be split up as multiple SMS.
Thomas

On 25 Oct, 2006, at 0:08, bobi@... wrote:
> But being a lazy programmer, I am thinking, why not just have
> Nagios be a
> little more forgiving and inquisitive and keep searching stdout
> until it
> finds the first non-empty line? Is that so bad? Or is it a feature?
You'll like Nagios 3 when it comes out. It'll allow you to read
multiple lines of stdout. So be patient, little grasshopper.
Cheers! o/
Thomas

bobi@... wrote:
> Hi All,
>
> Like everyone else, we've had our fair share of mysterious plugin "(No
> output!)" messages.
>
> Sometimes this error is due to the plugin failing and only writing
> diagnostics to stderr (and nothing is written to stdout.)
>
> Also, as everyone knows, Nagios only reads the first line of
> newline-terminated output from a plugin and throws the rest away. But,
> what if the first line is just a new-line and the good stuff is on a
> subsequent line?
>
> Yes, I know - fix your plugin to output only to the first line.
>
> But being a lazy programmer, I am thinking, why not just have Nagios be a
> little more forgiving and inquisitive and keep searching stdout until it
> finds the first non-empty line? Is that so bad? Or is it a feature?
>
It's a feature. Nice patch, btw, but I've got a couple of issues with
it, detailed below.
> Well, you can be the judge.
>
> Anyway, I put together this patch for checks.c - it modifies the plugin
> output handling logic in the following manner:
>
> 1. As usual, it reads plugin output from stdout.
>
> However, if the first line is empty, it keeps reading until it gets a
> non-empty line or EOF.
>
> If it gets a non-empty line, then that first non-empty line becomes the
> plugin output.
>
> 2. However, if it really gets no output from stdout (i.e., nothing or all
> empty lines,) then it reads the plugin's stderr and returns the first
> non-empty line it finds.
>
It would be nice if it could tell that this output comes from stderr
instead of just printing it out. That's a minor point though and I'm
sure it doesn't make any real difference anywhere.
>
> Anyway, I'd be very interested in any alternate suggestions, good
> comments, insightful observations or even witty reparte'.
>
> BTW, in order to provide the ability to read both stdout and stderr from a
> plugin sub-process, I've written my own version of the standard C popen(3)
> function called pfopen(). I did this because this is the problem with the
> standard popen(3) function - it only return stdout to the parent process,
> which may only give you half the story since it ignores all potentially
> usefull diagnostic info from the stderr of the child process.
>
True that. The plugins have something similar, called runcmd which
executes and fetches all output on both stderr and stdout of the command
being run.
All in all, nice patch :)
--
Andreas Ericsson andreas.ericsson@...
OP5 AB http://www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231

Hi
bobi@... wrote:
> Hi All,
>
> Like everyone else, we've had our fair share of mysterious plugin "(No
> output!)" messages.
>
> Sometimes this error is due to the plugin failing and only writing
> diagnostics to stderr (and nothing is written to stdout.)
>
> Also, as everyone knows, Nagios only reads the first line of
> newline-terminated output from a plugin and throws the rest away. But,
> what if the first line is just a new-line and the good stuff is on a
> subsequent line?
Then it's not good stuff; good stuff is on the first line of stdout *only*.
> Yes, I know - fix your plugin to output only to the first line.
>
Exactly :)
> But being a lazy programmer, I am thinking, why not just have Nagios be a
> little more forgiving and inquisitive and keep searching stdout until it
> finds the first non-empty line? Is that so bad? Or is it a feature?
>
Small scripts, writen once, executed once, can be lazily coded. However,
applications running for a long time (e.g. daemons) should be very
carefully coded.
> Well, you can be the judge.
>
> Anyway, I put together this patch for checks.c - it modifies the plugin
> output handling logic in the following manner:
>
> 1. As usual, it reads plugin output from stdout.
>
> However, if the first line is empty, it keeps reading until it gets a
> non-empty line or EOF.
>
> If it gets a non-empty line, then that first non-empty line becomes the
> plugin output.
>
> 2. However, if it really gets no output from stdout (i.e., nothing or all
> empty lines,) then it reads the plugin's stderr and returns the first
> non-empty line it finds.
>
> 3. If it gets nothing from stderr as well, then it finally returns
> everyone's favorite diagnostic: "(No output!)"
>
Taking your approach, this is certainly a good algorithm.
> You know, we should really change that diagnostic to: "(No output! Have a
> nice day!)"
>
This would make our favorite admins' days easier sometimes :)
>
> Anyway, I'd be very interested in any alternate suggestions, good
> comments, insightful observations or even witty reparte'.
>
> BTW, in order to provide the ability to read both stdout and stderr from a
> plugin sub-process, I've written my own version of the standard C popen(3)
> function called pfopen(). I did this because this is the problem with the
> standard popen(3) function - it only return stdout to the parent process,
> which may only give you half the story since it ignores all potentially
> usefull diagnostic info from the stderr of the child process.
>
That's an interesting piece of code. I'll keep it somewhere, just in case
Your patch would certainly work but I would not like it to be part of
the tree as is. The basic idea (to help with plugin debugging) is good
but the solution is, imho, not the best.
There is a specification. This specification states that the output of
the plugin is the first line and only the first line of the standard
output. This, and the return code of 0,1,2 or 3 is the only thing that a
plugin should respect.
Any plugin that does not respect that is flawed, that's it. Fix it or
throw it away, but it is not Nagios' responsibility to take care of the
well-formedness of its output.
Programs and standards are like fluids in balloons. Grow the balloon and
the fluid will expand (so you can't shrink it back). The entropy always
grows.
See what happened with the web "standards". The browser tolerate
incorrect pages, so designers don't worry, they feel free not to close
tags and such -- and because the browser accepts, they sometimes don't
even know they are doing wrong.
Now the web is the biggest headache producer among developers, because
trying to get this div show the same under Firefox, IE and Opera takes 2
hours where it should have taken 2 minutes if the W3C standards were
respected.
If the standard is bad, change it. Otherwise rewrite the faulty plugin.
Otoh, it *is* useful to have the output of the following lines and that
of stderr. This is where your patch should be patched. It should put an
horrible warning, maybe even return code 3, and then display the nth
line of stdout/stderr that it has found, like:
THINGY UNKNOWN - (Invalid output!) - blah blah
where blah blah is the said output.
This way, one can debug the plugin, but it is so awful in the interface
that one is forced to fix the actual source of the problem. Everybody is
happy, standards are enforced and smiles appear on all IT staff's faces.
--
Thibault GENESSAY
ALIADIS
http://www.aliadis.fr
Tel. +33 4 72 13 90 40
Fax +33 4 74 22 00 09

This is a very nice patch indeed. It doesn't break anything that's
working now, but lets module-authors get more power over how nagios
executes checks. It's also relatively small and non-intrusive and, as a
side-effect, it makes it possible to write plugins as modules. Overall,
I like it.
Some questions though, inlined below. Oh, and I would very much like to
see the module. :)
bobi@... wrote:
> Attached is a patch-set I would like some feedback on.
>
> The purpose of this patch is to allow Nagios the ability to delegate the
> execution of service checks to a NEB module.
>
> Why would we want to do this? I'm glad you asked...
>
> The point is to allow Nagios to scale efficiently in large-scale
> environments by delegating service checks to multi-node "check" clusters.
> That is, it facilitates the creation of a Nagios Service Check Cluster (or
> multiple independent clusters,) that can be deployed in either one
> location or multiple locations.
>
> The benefits are:
>
> 1. It de-couples Service Check execution from Scheduling on the same box.
> Sure, you can do this by setting up multiple Nagios instances that report
> their results passivley back up to the "master" Nagios box, but that
> requires manually splitting up you configuration among multiple Nagios
> instances, setting up all of the passive result reporting, etc.
>
> In this scenario, you can keep your centrally-located master configuration
> file and have the service check distributed to light-weight,
> geographically-dispersed service check clusters.
>
How does the module determine which node checks what?
How is configuration distributed?
> 2. Scalability. You can support more simultaneous service checks by
> adding more light-weight service check nodes incrementally.
>
Do you have to restart the "master" nagios in order for this to work, or
will they be picked up as one goes along?
If "picked up as one goes along", how does handshake and authentication
work?
> You can start with zero external nodes (i.e., all checks still executed by
> Nagios internally.) Then add one node as you service check count
> increases. Then gradually (or quickly,) increase the node count, locally
> or remotely, as your service check count grows, and the system will scale
> appropriately.
>
> Anyway, it's not the ultimate, end-all, be-all, but we have found it helps
> us scale and manage Nagios efficiently in our large-scale,
> multi-datacenter environment. The hope is that this will be considered as
> a potential part of the new Nagios architecture some day.
>
> For those who want to know how Nagios actually delegates service check
> execution to an external cluster via a NEB module, here are the high-level
> details:
>
> We have written a multi-threaded NEB module that registers a
> NEBCALLBACK_SERVICE_CHECK_DATA callback and watches for the
> NEBTYPE_SERVICECHECK_INITIATE event.
>
> It then takes each service check and distributes it across the network to
> multiple "worker" nodes in a cluster (via XML-RPC). It also takes care of
> processing the check results, posting them to the internal Nagios result
> queue, plugin timeout conditions, etc.
>
Does this go through the FIFO pipe? If so, I'm afraid it doesn't solve
the biggest issue in scaling Nagios to large networks.
--
Andreas Ericsson andreas.ericsson@...
OP5 AB http://www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231

Hi All,
Like everyone else, we've had our fair share of mysterious plugin "(No
output!)" messages.
Sometimes this error is due to the plugin failing and only writing
diagnostics to stderr (and nothing is written to stdout.)
Also, as everyone knows, Nagios only reads the first line of
newline-terminated output from a plugin and throws the rest away. But,
what if the first line is just a new-line and the good stuff is on a
subsequent line?
Yes, I know - fix your plugin to output only to the first line.
But being a lazy programmer, I am thinking, why not just have Nagios be a
little more forgiving and inquisitive and keep searching stdout until it
finds the first non-empty line? Is that so bad? Or is it a feature?
Well, you can be the judge.
Anyway, I put together this patch for checks.c - it modifies the plugin
output handling logic in the following manner:
1. As usual, it reads plugin output from stdout.
However, if the first line is empty, it keeps reading until it gets a
non-empty line or EOF.
If it gets a non-empty line, then that first non-empty line becomes the
plugin output.
2. However, if it really gets no output from stdout (i.e., nothing or all
empty lines,) then it reads the plugin's stderr and returns the first
non-empty line it finds.
3. If it gets nothing from stderr as well, then it finally returns
everyone's favorite diagnostic: "(No output!)"
You know, we should really change that diagnostic to: "(No output! Have a
nice day!)"
Anyway, I'd be very interested in any alternate suggestions, good
comments, insightful observations or even witty reparte'.
BTW, in order to provide the ability to read both stdout and stderr from a
plugin sub-process, I've written my own version of the standard C popen(3)
function called pfopen(). I did this because this is the problem with the
standard popen(3) function - it only return stdout to the parent process,
which may only give you half the story since it ignores all potentially
usefull diagnostic info from the stderr of the child process.
If you use the patch, then pfopen.c should be added to the "base"
directory and pfopen.h should be added to the "include" directory. Both
patch files (for "Makefile.in" and "checks.c",) are applied to those file
in the "base" directory.
Regards,
Bob Ingraham

Attached is a patch-set I would like some feedback on.
The purpose of this patch is to allow Nagios the ability to delegate the
execution of service checks to a NEB module.
Why would we want to do this? I'm glad you asked...
The point is to allow Nagios to scale efficiently in large-scale
environments by delegating service checks to multi-node "check" clusters.
That is, it facilitates the creation of a Nagios Service Check Cluster (or
multiple independent clusters,) that can be deployed in either one
location or multiple locations.
The benefits are:
1. It de-couples Service Check execution from Scheduling on the same box.
Sure, you can do this by setting up multiple Nagios instances that report
their results passivley back up to the "master" Nagios box, but that
requires manually splitting up you configuration among multiple Nagios
instances, setting up all of the passive result reporting, etc.
In this scenario, you can keep your centrally-located master configuration
file and have the service check distributed to light-weight,
geographically-dispersed service check clusters.
2. Scalability. You can support more simultaneous service checks by
adding more light-weight service check nodes incrementally.
You can start with zero external nodes (i.e., all checks still executed by
Nagios internally.) Then add one node as you service check count
increases. Then gradually (or quickly,) increase the node count, locally
or remotely, as your service check count grows, and the system will scale
appropriately.
Anyway, it's not the ultimate, end-all, be-all, but we have found it helps
us scale and manage Nagios efficiently in our large-scale,
multi-datacenter environment. The hope is that this will be considered as
a potential part of the new Nagios architecture some day.
For those who want to know how Nagios actually delegates service check
execution to an external cluster via a NEB module, here are the high-level
details:
We have written a multi-threaded NEB module that registers a
NEBCALLBACK_SERVICE_CHECK_DATA callback and watches for the
NEBTYPE_SERVICECHECK_INITIATE event.
It then takes each service check and distributes it across the network to
multiple "worker" nodes in a cluster (via XML-RPC). It also takes care of
processing the check results, posting them to the internal Nagios result
queue, plugin timeout conditions, etc.
The way this works is that Nagios now checks the return code from NEB
modules who are registered for the NEBCALLBACK_SERVICE_CHECK_DATA event.
If the NEB module returns the "new" NEBERROR_CALLBACKOVERRIDE result code,
Nagios "delegates" execution of the service check to the NEB module.
Otherwise, Nagios continues to execute the service check itself, as it
normally does.
So, the attached patch files enable this functionality.
Note that this patch set does not include our multi-threaded NEB module
(if you're interested in that, just e-mail me - it's meant to be open
source.) It just includes the patches to allow a NEB modules to override
service check execution.
This should be a pretty straightforward patch, and doesn't modify any
functionality in the absence of the broker. We just need it to expand the
flexibility of what a NEB module can do.
Thanks,
Bob

Hi again....
With the risk of getting off topic:
I don't want to maintain Cacti too, if i have to check the interface for
status, i
might as well do the performance stuff too, in the same plugin.
One other post suggested memcache, or database. I will stick with my
quick&dirty
/tmp files.... ;)
Eli Stair wrote:
>
> I haven't messed with the ndo at all (which may be able to store
> arbitrary data for a host in a table?), but if you use cacti to do
> your snmp polling, you not only get nice trending data visually, but
> can use db acceses for the host if you want yo look up further data...
> Or just pull the values you want our of the rrd for that host:if
>
> /eli
>
>
> -----Original Message-----
> From: nagios-devel-bounces@...
> <nagios-devel-bounces@...>
> To: nagios-devel@...
> <nagios-devel@...>
> Sent: Thu Oct 19 11:30:28 2006
> Subject: [Nagios-devel] Message passing between host/service check plugin.
>
> Hi Group,
>
> Am i the only person in here needing a way to pass someting to a
> check plugin,
> from the last invocation ?
>
> I don't like using snmp OID values for my plugin executions, som i
> use the interfacename
> to poll the status via SNMP. To do this i have to walk the ifDescr
> snmp table. Once i find the
> interface index, i can poll status, IN/Out Octets etc...
>
> The next time the same service get's checked it would be nice to
> pass the last dermined interface
> index, and only test if the index still has the correct IfDescr. To
> do this i rely on temporary files today,
> this is a mess.
>
> I could pass the index in the $SERVICEOUTPUT$ output, but that could
> potentialy clutter up the information, displayed as the
> check message, i could also use $SERVICEPERFDATA$,but that is used
> for performance data, and get's passed to external performance
> collector stuff.
>
> I would like to have the possibility for a 3rd field in the check
> output, getting stored in a
> $SERVICETEMPDATA$ which could be used for about anything toaking care
> thath the string will
> be limited in size.
>
> Kind Regards,
> Peter
>
>
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642&gt;
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel@...
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>

Community

Help

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

CountryState

JavaScript is required for this form.

I agree to receive quotes, newsletters and other information from sourceforge.net and its partners regarding IT services and products. I understand that I can withdraw my consent at any time. Please refer to our Privacy Policy or Contact Us for more details