[SOLVED] Logger problems after upgrade 5.0.9 to 6.0.6

As a Zimbra partner, we ran into the very same problem during a customer upgrade, under similar circumstances:

ZCS 5.0.16 running on SLES 10 SP2 x86_64 on Xen DomU, upgrade to ZCS 6.0.4. Note: we didn't want to go to ZCS 6.0.6 directly, because it would have required a host OS upgrade for which there was no downtime allowed.

During upgrade, we found no noticeable errors on standard output, nor in /tmp/zmsetup.log, nor in /var/log/messages. Note: as best practice, we always tail -f /var/log/messages on a separate terminal during installs and upgrades. We find it will eventually leak info that stdout would otherwise skip and give you a head start on finding the root cause for unusual problems.

The upgrade completed fine and all ZCS services were functional, except that the Admin web interface Serer Status showed red X's for every service. Meanwhile su - zimbra; zmcontrol status showed every service as "Running".

Furthermore, clicking on any of the Server Statistics in the Admin web interface resulted in an error, which could eventually be root caused to the fact the sqlite logger DB had not been created during the upgrade (the /opt/zimbra/logger/db directory was empty) and the /var/log/zimbra-stats.log remained at 0 bytes.

Note: normally you would not start zmlogger manually like this. Instead, you would use the zmloggerctl command. But in this case, the zmloggerctl might fail because we haven't fixed the problem of syslog not being able to log to the /var/log/zimbra-stats.log file (we'll do that next). So, this is just a temporary fix to create the sqlite logger DB files we need.

There should be no zmlogger process running. If there are still some, run:killall zmlogger

3. Fix the syslog problem that prevents logging to /var/log/zimbra-stats.log.

Here, there are two variants of the problem. The first is those systems that use rsyslog which often only require a restart of rsyslogd. This issue is amply documented in several threads of the Zimbra forums, so I won't repeat those conversations here.

Our case was the second variant, involving those systems that use syslog-ng as a logging daemon. It turned out that the following file:/etc/syslog-ng/syslog-ng.conf.in

was modified by the Zimbra installer in such a way that prevented the final system file:/etc/syslog-ng/syslog-ng.conf

from being re-built correctly. Specifically, the directives that involve logging to the /var/log/zimbra-stats.log file didn't make to the final system file.

This is verified by running, as root, the following:grep "# zimbra" syslog-ng.conf.in

From the above, we see that several lines from the syslog-ng.conf.in didn't make it to the syslog-ng.conf file.

After a few unsuccessful attempts to fix syslog-ng.conf.in and re-run zmsyslogsetup, which repeatedly caused the system to complain about the following line in syslog-ng.conf.in:filter zimbra_auth { facility(auth); }; # zimbra

we simply chose to fix syslog-ng.conf directly. To do so, we stopped the syslog daemon. As root, run:/etc/init.d/syslog stop

Then we edited syslog-ng.conf and made the last section look like this:

If you get this far, it should be a matter of less than an hour until there is enough statistical data accumulated for some of the graphs to show up in the Server Statistic web admin interface, and for those red X's to be replaced by nice blue checkmarks in the Server Status page.

I should mention that the issues and findings I had made were
verbatim as to what you indicated, however, I was at a loss as
to where to begin to fix it. Your solution was spot on. To others
if you have problems with zmstats after the above procedure, do
a zmstatctl stop, wait a minute and then run zmstatctl start and
wait a few minutes. Then do a zmcontrol status and see if
it is running. The stats process was a little slow on my server to
start after the repairs.

Note to sysklogd users: sysklogd does not support RFC 3164 format, which is the default forwarding template in rsyslog. As such, you will experience duplicate hostnames if rsyslog is the sender and sysklogd is the receiver. The fix is simple: you need to use a different template. Use that one: