I am running two 8.8.5 servers - one is an upgrade and one is a new installation. Both seem to have a high CPU load and then the web mail login screen slows eventually to a timeout which produces the error:

upstream server is blocked by a firewallupstream server is failing to send back the response in timeupstream server is downPlease contact your ZCS administrator to fix the problem.

Powered by Nginx-Zimbra://

When I restart all the services, there is a lengthy shutdown of zimbra webapp process and then continues as normal. I don't see any errors in the logs and I am at a loss. Customers are not happy and I wish I could roll back to 8.7. Let me know what information may help in the troubleshooting.

Last edited by verticon on Thu Jan 04, 2018 6:30 pm, edited 1 time in total.

The very nature of the issue you describe makes it hard to pin down the core issue. If I understand you right are these two single server installations? That makes any network issues between nginx and mailboxd less likely though there might be an issue with local port exhaustion or DNS anyway. Also, you say that the login screen starts to fail from which I read that existing webmail sessions keep working?

Things I'd look into are:

Which process is causing the high CPU load?

Is the system swapping?

Is the system maybe underpowered and you need a RAM or CPU upgrade?

If you just restart nginx (zmproxyctl restart) or mailboxd (zmmailboxdctl restart) or even LDAP (ldap restart) when this happens, does one of these issues mitigate the symptoms as well?

Are there really no fishy looking entries in the relevant log files like nginx.log mailbox.log zmmailboxd.out myslow.log mysql_error.log gc.log?

How many TCP connections do you see?

Judging by the nginx.access.log access_log sync.log ews.log do you maybe having a client which is running amok?

If you try to connect to the mailboxd backend directly, does that work?

Do you see different behaviour when you try to curl the HTTP endpoints from the host nginx is running on vs. your client?

Any network issues like DNS and is /etc/hosts properly set up?

Does the OS complain about any issues like a broken disk or a resyncing RAID?

That's just how I'd start; when I#d see something interesting I'd drill down from there :-)

Does It occur to all the accounts or just to some of them. Check the /opt/zimbra/log/mailbox.log for some JAVA exceptions and some mailbox locks (not account locks). I have this problem after updating from latest in 8.7 series to 8.8.5. It happens to accounts that are heavily using IMAP (lots of folders to sync) and a lot of CalDav task shares (DAVdroid and K-9 Android mail client). Check the performance tuning guide here: https://wiki.zimbra.com/wiki/Performanc ... eployments

I think I may have tracked it down to a single mailbox through reading the zmmailboxd.out. It's using IMAP, and threw the below error which seemed to start other errors related to locking.

2018-01-03 01:19:27,793 INFO [ImapSSLServer-1] [name=sales@xxx.ca;mid=6;ip=192.99.xxx.xxx;oip=192.99.xxx.xxx3;via=192.99.xxx.xxx(nginx/1.7.1);ua=ZimbraImapDataSource/8.8.5_GA_1894;] imap - selected folder INBOX/Stxxxx at com.zimbra.soap.SoapServlet.doPost(SoapServlet.java:213) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at com.zimbra.cs.servlet.ZimbraServlet.service(ZimbraServlet.java:206) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:821) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1685) at com.zimbra.cs.servlet.CsrfFilter.doFilter(CsrfFilter.java:169) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at com.zimbra.cs.servlet.RequestStringFilter.doFilter(RequestStringFilter.java:54) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at com.zimbra.cs.servlet.SetHeaderFilter.doFilter(SetHeaderFilter.java:59) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at com.zimbra.cs.servlet.ETagHeaderFilter.doFilter(ETagHeaderFilter.java:47) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at com.zimbra.cs.servlet.ContextPathBasedThreadPoolBalancerFilter.doFilter(ContextPathBasedThreadPoolBalancerFilter.java:107) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at com.zimbra.cs.servlet.ZimbraQoSFilter.doFilter(ZimbraQoSFilter.java:116) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at com.zimbra.cs.servlet.ZimbraInvalidLoginFilter.doFilter(ZimbraInvalidLoginFilter.java:117) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at org.eclipse.jetty.servlets.DoSFilter.doFilterChain(DoSFilter.java:473) at org.eclipse.jetty.servlets.DoSFilter.doFilter(DoSFilter.java:318) at org.eclipse.jetty.servlets.DoSFilter.doFilter(DoSFilter.java:288) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1158) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1090) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:318) at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:437) at org.eclipse.jetty.server.handler.DebugHandler.handle(DebugHandler.java:84) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119) at org.eclipse.jetty.server.Server.handle(Server.java:517) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:306) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:261) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:192) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:261) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:748)Lock Waiter - qtp998351292-4769:https:https://mail.xxx.com/service/soap/SearchRequest prio=5 id=4769 state=TIMED_WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871) at com.zimbra.cs.mailbox.MailboxLock.tryLockWithTimeout(MailboxLock.java:116) at com.zimbra.cs.mailbox.MailboxLock.lock(MailboxLock.java:194) at com.zimbra.cs.mailbox.Mailbox.beginTransaction(Mailbox.java:1759) at com.zimbra.cs.mailbox.Mailbox.beginReadTransaction(Mailbox.java:1735) at com.zimbra.cs.mailbox.Mailbox.getItemById(Mailbox.java:2864) at com.zimbra.cs.mailbox.Mailbox.getItemById(Mailbox.java:2856) at com.zimbra.cs.mailbox.Mailbox.getFolderById(Mailbox.java:4092) at com.zimbra.cs.index.query.InQuery.create(InQuery.java:68) at com.zimbra.cs.index.query.parser.QueryParser.createQuery(QueryParser.java:577) at com.zimbra.cs.index.query.parser.QueryParser.toTerm(QueryParser.java:427) at com.zimbra.cs.index.query.parser.QueryParser.toTextClause(QueryParser.java:399) at com.zimbra.cs.index.query.parser.QueryParser.toClause(QueryParser.java:363) at com.zimbra.cs.index.query.parser.QueryParser.toQuery(QueryParser.java:342) at com.zimbra.cs.index.query.parser.QueryParser.parse(QueryParser.java:305) at com.zimbra.cs.index.ZimbraQuery.<init>(ZimbraQuery.java:378) at com.zimbra.cs.mailbox.MailboxIndex.search(MailboxIndex.java:166) at com.zimbra.cs.service.mail.Search.handle(Search.java:111) at com.zimbra.soap.SoapEngine.dispatchRequest(SoapEngine.java:607) at com.zimbra.soap.SoapEngine.dispatch(SoapEngine.java:460) at com.zimbra.soap.SoapEngine.dispatch(SoapEngine.java:273)

Hmm… I've already seen locking issues on some largish 8.7.x installations (multi server with about 1000 mailboxes per server) with similar workloads as nikonaum described as well. Especially if Apple macOS products were involved. I wasn't able to track down the root cause yet though. There is an undocumented localconfig option to increase the possible number of parallel locks which just shifted the issue somewhere else. I forgot the details after the new year vacation, would have to look it up. I *think* these locking issues shouldn't be able to bring down a whole server, only a single mailbox though. But I might be wrong. Has anybody else seen something similar yet by any chance?

It doesn't sound like you have more than a few hundred mailboxes so as a rough ballpark number your server specs sound fine so far. Mixing in some SSD storage often helps with performance issues but I don't really think yours is a performance issue per se.

One question due to some oddities in the stacktrace you posted: Do you use (ie. install) the new separate IMAPD service?

Last edited by msquadrat on Wed Jan 03, 2018 8:47 pm, edited 1 time in total.

I have about 60 mailboxes across multiple domains (approx 10 domains). I have also increased the Imap threads on the server. I only have one user who uses Imap and the rest use the ActiveSync from Zextras.

I am not using the Beta Imap service.

I have also set the virtualhostname and virtualipaddress to match the server for each domain thinking that was it as well.

nikonaum wrote:Does It occur to all the accounts or just to some of them. Check the /opt/zimbra/log/mailbox.log for some JAVA exceptions and some mailbox locks (not account locks). I have this problem after updating from latest in 8.7 series to 8.8.5. It happens to accounts that are heavily using IMAP (lots of folders to sync) and a lot of CalDav task shares (DAVdroid and K-9 Android mail client). Check the performance tuning guide here: https://wiki.zimbra.com/wiki/Performanc ... eployments

I increased the Imap services. It affects all accounts as the https page doesn't spawn. I am able to use the Jetty service 8443 as a workaround but would rather use the nginx server.

As written in the upgrade notes of zimbra the new IMAPD is BETA, e.g. It'll not be enabled after install. Never the less I chose the option not to install it at all. My server is a Supermicro one with 4 AMD processors with 12 threads each. Samsung EVO pro SSD and 64GB RAM, had no such issues on 8.7.11. My DAVdroid and K-9 mail clients are always up-to-date and 2 weeks ago I had no such JAVA exceptions messages in my logs. All appeared after the upgrade. I even increased the maximum number of IMAP sessions, which helped at first but than all errors come back. The interesting thing is that the server is locking my own account and the one of a colleague whom I shared 10 tasks list with. Maybe It is something with IMAP and calDAV shares and the number of http sessions, I can't quit figure it out. I added this configs and It was ok for 2 days but then It lock happened again: