Matti,
I'm experiencing this problem for a *long* time, but as this happens
rarely, it did not bother me much until recently.
Sometimes smtp transport process hangs and stays in the process list for
a long time (days). This effectively stops the queue for the domain it
is contacting to. The process uses zero CPU and leaves *no* traces in
the smtp log. Netstat does not show any open connection on the receiving
end (and I cannot check if there is a connection associated with this
process on the originating end).
Today, I caught such process and killed it with sig 11 (with -QUIT, it
leaves no core file?). Zmailer 2.99.47 + all patches, SPARC Solaris 2.5.1.
This is what gdb shows:
Core was generated by `/usr/zmailer/bin/ta/smtp -s8H -l /var/log/zmailer/smtp'.
Program terminated with signal 11, Segmentation fault.
procfs (find_procinfo): Couldn't locate pid 0
#0 0xef636d58 in _end ()
(gdb) bt
#0 0xef636d58 in _end ()
#1 0xef6732cc in _end ()
#2 0xef6475f8 in _end ()
#3 0xef6f2158 in _end ()
#4 0xef6d77c8 in _end ()
#5 0xef6f1bb8 in _end ()
#6 0x1fec0 in stachmyaddress (host=0x44fc7 "koi.smtp.online.ru")
at selfaddrs.c:296
#7 0x2011c in stachmyaddresses (host=0x44fd9 "") at selfaddrs.c:420
#8 0x16604 in smtpconn (SS=0xeffff998, host=0x454b0 "office.sob.tulane.edu",
noMX=0) at smtp.c:1758
#9 0x163b0 in smtpopen (SS=0xeffff998, host=0x454b0 "office.sob.tulane.edu",
noMX=0) at smtp.c:1689
#10 0x13e28 in main (argc=0, argv=0x454b0) at smtp.c:669
(gdb)
"koi.smtp.online.ru" is one of IP aliases of the local machine.
selfaddrs.c:296 is gethostbyname() call. I *can* beleive that this is
an error in Solaris... Though this happend in 2.4 as well.
Also, I *can* write a wrapper around gethostbyname doing alarm(),
but it sounds ugly.
Any better ideas?
Maybe the scheduler could kill letargic childs? Something else?
Eugene