Re: configuration files

On Fri, 12 Dec 2003 13:34:29 -0800 (PST)
Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx> wrote:
> :I'm not sure that the implication was that RCNG should be scrapped,
> only:that it should [eventually] do what runit and daemontools do,
> namely,:service monitoring.
> :
> :-Chris
>
> Service monitoring is an interesting problem. I wrote a program
> long ago which is available at:
>
> http://apollo.backplane.com/FreeSrc/runmaint.c
>
> It was originally written to maintain sendmail and named but with
> a little cleanup and additional work to add a periodic testing
> function(e.g. to run a test script once every so often), a better
> 'stop' interface, and a unix-domain socket for communicating
> requests, we could integrate this into RCNG to provide automatic
> service monitoring, restart, stop, and reporting features for all
> non-inetd system-started daemons.
>
> It would be very easy to integrate with the RCNG varsym variables
> that are already being set.
It is kind of an interesting problem - and it doesn't seem too
difficult, at least from my perspective.
RCNG already has the ability to start and stop services by name, and
determine their status.
To monitor them, all it needs to do is, every second or so, check each
processes' status. If it was supposed to be running, but it's not,
assume it died and restart it. That's all, really.
(Of course, poll-hater that I am, I have to wonder if DragonFly can't
eventually make it so that each process started by RCNG is somehow set
up to send a message to the process that started it when it dies. Then
the monitor doesn't have to check every second, it can just wait for
'exit' messages to come in.)
Things like restart limiting would be nice - if a service has been
retstarted say 50 times in the last 50 seconds, assume there's something
wrong with its configuration/start-script, and don't try to start it
again. (Erlang can do this, daemontools can't, not sure about runit.)
But that's gravy - once the basic mechanism for restarting services that
have died is in place, any number of useful features (reliable logging,
concurrency limiting, restart limiting, etc) could be added.
-Chris