Zimbra on OpenVZ (stability issues)

Dear forum,

I'm using zimbra on openvz and have stability problems with it. OpenVZ does not show any errors and I'm running a couple of other services in VEs on it without a problem (e.g. apache, tomcat and jetty, postgres, mysql).

The box is a dual core athlon with 4Gb of memory.
The load average is: 0.10, 0.13, 0.09

The physical server is running debian with a 2.6.18 kernel and OpenVZ kernel. Filessystem is reiserfs.

This is the most frequent problem with zimbra that I have. It also corrupted the virus database one time and other stuff, but maybe anybody has an idea where to search for this and it will also solve the other issues.

Thanks

Dani

Code:

2007-04-04 00:01:17,313 INFO [LmtpServer-637] [name=foo@bar.com;] FileBlobStore - deleting blob 900 in mailbox 7
2007-04-04 00:01:17,313 INFO [LmtpServer-637] [name=foo@bar.com;] ZimbraLmtpBackend - try again for message foo@bar.com: exception occurred
com.zimbra.cs.service.ServiceException: system failure: indexMessage caught IOException
at com.zimbra.cs.service.ServiceException.FAILURE(ServiceException.java:174)
at com.zimbra.cs.index.Indexer.indexMessage(Indexer.java:161)
at com.zimbra.cs.mailbox.Message.reindex(Message.java:421)
at com.zimbra.cs.mailbox.Mailbox.endTransaction(Mailbox.java:4371)
at com.zimbra.cs.mailbox.Mailbox.addMessageInternal(Mailbox.java:3318)
at com.zimbra.cs.mailbox.Mailbox.addMessage(Mailbox.java:3062)
at com.zimbra.cs.mailbox.Mailbox.addMessage(Mailbox.java:3046)
at com.zimbra.cs.filter.ZimbraMailAdapter.addMessage(ZimbraMailAdapter.java:327)
at com.zimbra.cs.filter.ZimbraMailAdapter.doDefaultFiling(ZimbraMailAdapter.java:321)
at com.zimbra.cs.filter.RuleManager.applyRules(RuleManager.java:182)
at com.zimbra.cs.lmtpserver.ZimbraLmtpBackend.deliverMessageToLocalMailboxes(ZimbraLmtpBackend.java:308)
at com.zimbra.cs.lmtpserver.ZimbraLmtpBackend.deliver(ZimbraLmtpBackend.java:122)
at com.zimbra.cs.lmtpserver.LmtpHandler.doDATA(LmtpHandler.java:420)
at com.zimbra.cs.lmtpserver.LmtpHandler.processCommand(LmtpHandler.java:197)
at com.zimbra.cs.tcpserver.ProtocolHandler.processConnection(ProtocolHandler.java:231)
at com.zimbra.cs.tcpserver.ProtocolHandler.run(ProtocolHandler.java:198)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.io.IOException: Could not create index /opt/zimbra/index/0/7/index/0 (directory already exists)
at com.zimbra.cs.index.MailboxIndex.openIndexWriter(MailboxIndex.java:906)
at com.zimbra.cs.index.MailboxIndex.addDocument(MailboxIndex.java:349)
at com.zimbra.cs.index.Indexer.addDocument(Indexer.java:354)
at com.zimbra.cs.index.Indexer.addDocument(Indexer.java:324)
at com.zimbra.cs.index.Indexer.indexMessage(Indexer.java:156)
... 16 more
Caused by: java.io.IOException: Lock obtain timed out: Lock@/opt/zimbra/apache-tomcat-5.5.15/temp/lucene-f3106ce454e35a1768d130f89afe4090-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:58)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:223)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:173)
at com.zimbra.cs.index.MailboxIndex.openIndexWriter(MailboxIndex.java:887)
... 20 more

Code:

# Copyright (C) 2000-2006 SWsoft. All rights reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
VERSION="2"
ONBOOT="yes"
# UBC parameters (in form of barrier:limit)
# Primary parameters
#AVNUMPROC="40:40"
AVNUMPROC="300:300"
#NUMPROC="65:65"
#NUMPROC="9223372036854775807:9223372036854775807"
NUMPROC="500:500"
NUMTCPSOCK="400:400"
NUMOTHERSOCK="1024:1024"
#VMGUARPAGES="6144:9223372036854775807"
VMGUARPAGES="2050000:2097152"
# Secondary parameters
#KMEMSIZE="5242880:5592405"
KMEMSIZE="185794560:189289810"
#KMEMSIZE="9223372036854775807:9223372036854775807"
TCPSNDBUF="3194880:8314880"
TCPRCVBUF="3194880:8314880"
OTHERSOCKBUF="1320960:11560960"
#OTHERSOCKBUF="9223372036854775807:9223372036854775807"
DGRAMRCVBUF="8314880:8314880"
#OOMGUARPAGES="6144:9223372036854775807"
OOMGUARPAGES="2060000:2060000"
# Auxiliary parameters
LOCKEDPAGES="32:32"
SHMPAGES="8192:8192"
#PRIVVMPAGES="500000:750000"
#PRIVVMPAGES="4000000000:4000000000"
PRIVVMPAGES="2050000:2097152"
#NUMFILE="2048:2048"
#NUMFILE="9223372036854775807:9223372036854775807"
NUMFILE="16384:16384"
NUMFLOCK="200:220"
NUMPTY="16:16"
NUMSIGINFO="256:256"
DCACHESIZE="1048576:6291456"
#PHYSPAGES="0:9223372036854775807"
PHYSPAGES="0:393216"
NUMIPTENT="128:128"
# Disk quota parameters (in form of softlimit:hardlimit)
DISKSPACE="10485760:11534340"
DISKINODES="200000:220000"
QUOTATIME="0"
# CPU fair sheduler parameter
CPUUNITS="8000"
CPULIMIT="99"
OFFLINE_MANAGEMENT="yes"
VE_ROOT="/var/lib/vz/root/$VEID"
VE_PRIVATE="/var/lib/vz/private/$VEID"
OSTEMPLATE="debian-3.1-i386-minimal"
ORIGIN_SAMPLE="vps.basic"
HOSTNAME="foo.bar.com"
IP_ADDRESS="192.168.3.11"
NAMESERVER="192.168.3.254 129.132.98.12 129.132.250.2 129.132.250.220"

We don't recommend running Zimbra in a VM but having said that you shouldn't be experiencing these problems. Some people do use Zimbra in a VM without any trouble at all. I don't know what the specific problem is here but we can start with the obvious questions.

How much memory does this Zimbra server have allocated to it? There is a Debian Kernel bug that causes problems with file ownership, I don't know if it's been fixed in Debian but you could search the forums for some details. Have you tried re-indexing the mailbox when this error occurs? Are any of the Zimbra files on a NAS/NFS device? Apart from the piece of log you've posted here is there anything else in the logs? Anything in the system logs that may indicate a hardware problem?

I'd like you to upgrade to the most recent release and see if the problems continue, is that possible?

I can update zimbra to the latest version if really required, but would prefer to work this out some more.

/opt/zimbra is on reiserfs. The VEs are also working on reiserfs (without problems). I noticed sometimes, that files exits on /opt/zimbra/index/.. that belong to root and not zimbra (Will check for that debian bug).

With the zimbra VE running, I get this sometimes (see below), but /proc/user_beancounters is alway fine (last coloumn is failcnt which is alway 0, so no error)

TCP: too many of orphaned sockets
printk: 16 messages suppressed.
Out of socket memory
Out of socket memory
Out of socket memory
Out of socket memory
Out of socket memory
Out of socket memory
Out of socket memory
Out of socket memory
Out of socket memory
Out of socket memory
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip default_idle+0x29/0x50

Cerainly if you have those files owned by root then I believe the Debian bug is still there. It ends up with incorrect permissions when there's an unclean shutdown. That may not, of course, be the underlying problem. I asked earlier, how much memory and are any of these files on a NAS/NFS device?

I can update zimbra to the latest version if really required, but would prefer to work this out some more.

/opt/zimbra is on reiserfs. The VEs are also working on reiserfs (without problems). I noticed sometimes, that files exits on /opt/zimbra/index/.. that belong to root and not zimbra (Will check for that debian bug).

With the zimbra VE running, I get this sometimes (see below), but /proc/user_beancounters is alway fine (last coloumn is failcnt which is alway 0, so no error)

Reiserfs is being depreciated, and it does sound like you are hitting that kernel bug. Reiserfs (or any logging file system) also doesn't work as well for applications that have their own independent write/redo logging (as Zimbra and Mysql do).

ext3 is a metadata only logging system, and is probably the best way to go here (it certainly is the most tested disk format for Zimbra). I don't recommend ext2 because the fsck time is so bad after a reboot.

The error you posted looks like a independent issue. Orphaned TCP sockets sound like VZ is not working the way it should.

ext3 is a metadata only logging system, and is probably the best way to go here (it certainly is the most tested disk format for Zimbra). I don't recommend ext2 because the fsck time is so bad after a reboot.

I 100% agree with lostknight. Those of you who've been around a while (2 years) will remember the time I wiped my live e-mail system, then got almost sent to jail for destroying supeonaed e-mails. Oh boy, those were the days.

Anyway, ext2 makes running midnight-commander for recovery, easier.

If you have an admin that won't wipe your system, then ext3 will work excellent.

Although personally I dropped SLES a while ago due to novell's lack of willingness to fix kernel bugs, I'm surprised to hear this response from Zimbra. reiserfs has been the default fs in Suse for years - SLES has quite a large userbase, as well as various versions being supported NE platforms so Zimbra should be running on many reiserfs installs. Reiserfs3 is a very mature fs having been run on vast numbers of computers, I doubt you're throwing up anything new at it. It's unfortunate the few problems in fs3 that were being addressed in fs4 are unlikely to happen while Reiser remains banged up on murder charges

I would absolutely agree with lostknight, sounds like OpenVZ deficiency.