We've been testing hMailServer for a few weeks and it was running great so we put it into production and started loading users onto the system last Friday night (Feb 27).

Yesterday (March 1) we noticed, by chance, that the queue (Status --> Delivery Queue) was growing. After further testing we realized that hMailServer was still accepting and queuing inbound SMTP messages, however, it was not delivering any messages (locally or out to the internet). Thus the Delivery Queue started to grow.

After restarting the service the queue was immediately flushed and things went back to normal. All emails were delivered (locally and externally).

Today (March 2) the same situation happened. This time we just paused the service (Status --> Server --> Pause) and then unpaused it and the queue was flushed and everything went back to normal. All emails were delivered (locally and externally).

I should note that before we pause/unpaused the service during todays incident we randomly chose a few messages in the Delivery Queue and tried to force delivery (Right Click) but that did not work. Its almost as though the SMTPDeliveryManager just disappeared.

- The server has under 10 domains and about ~20 accounts total (across all domains).
- We have some aliases; 1 smtp route; and a few distribution lists setup too.
- We have a few IP ranges defined for free-relaying (mainly for trusted local webserver apps)
- No grey or whitelists setup
- We have both DNS blacklists enabled as well as SURBL
- No antivirus configured (yet) except for attachment blocking
- CPU/RAM usage is negligible as the server is not very taxed.

After today's incident we enabled debug logging in the hopes that if this happens again we might get a further clue. It is quite concerning however.

If anybody can please provide some guidance / assistance as to what else to look for or what might cause this we would appreciate it very much.

No error logs Created?
Was there any mini-dump created in the log folder?

Did you have any logging enabled?
When you right click - send now, did that generate any logging?

What are your performance >> Threading settings?
In the status page of the admin GUI, Did you notice how many open sessions there were at the time?

What AV is used on the system?
How do your users connect?

My GUESS is that the ability to make more connections was not evident, and that a restart (or a pause and start) ended connections that hadn't self terminated. IMAP Idle connections is a likely candidate, as is connections to an AV scanner during processing. Are you ONLY using the default group of settings in your hMailserver.ini? (You are not using any of Bills ALPHA settings that were included quite a while back?)

That is certainly NOT a lot of mail accounts. There are reports of hMailserver handling millions of messages per day, over 10 000's of domains and 100 000's of accounts. The specs you detail should handle 100 000's of messages per day very easily.

Just 'cause I link to a page and say little else doesn't mean I am not being nice.
https://www.hmailserver.com/documentation

1.a) No error logs were created at the time of the error. There was an error log created during service restart, but I suspect thats because we probably interrupted some operation with the reset. The two lines below showed up during the service reset and are the only 2 entries in the error log at this time.
"ERROR" 7416 "2015-03-02 11:49:52.426" "Severity: 3 (Medium), Code: HM4227, Source: File::ReadFile, Description: An unknown error occurred while reading file from disk."
"ERROR" 7416 "2015-03-02 11:49:52.426" "Severity: 2 (High), Code: HM5017, Source: ScriptServer::LoadScripts, Description: An exception was thrown when loading scripts."

1.b) Not sure what a mini-dump is, but the only files in the log folder are the usual .log files

2.a) Initially we just had Application, SMTP, TCP/IP logging enabled. After the second time this happened, we added Debug logging. It hasn't happened since Debug logging was enabled so no further info. Anything specific we should look for in App/SMTP/TCPIP logs?

2.b) When we tried the Send Now from the queue nothing happened, and nothing was logged. Its as though part of the system just 'died'.

3.b) I did not specifically notice, however, if there was a large number I'm sure I would have caught it as I was looking through those tabs while the issue was ongoing. We only have ~20 users right now so even if all of them were connected at the same time I couldn't see that being a big issue? As you said, this system should handle it.

4.a) We're not using any AV at the moment (not for the server, nor for email configured via hMail). We only have attachment blocking configured in the AV area. We do have the delete attachment & notify options set in the AV config page, but no AV is actually configured. I dont suppose that would be a problem?

4.b) Users connect via POP and IMAP. SMTP too of course.

5.a) Regarding your theory - if I recall correctly, local deliveries were not working either - and I assume thats just a write / filemove on disk? I'm pretty sure I would have noticed a large connection count on the status tab. Nothing stood out when I was clicking through. Even if that were the case - how would that scenario arise and how do we fix/prevent it? I would assume hmail has some timers or force close ability? The first time this happened the queue was probably filling up for over 2h. Today we caught it right away because we happened to be on the server.

The issue just happened again and I have much more valuable data. I hope this is enough to troubleshoot.

1) The queue started growing again so I checked out the debug logs (snippit below)
2) I looked at the earliest timestamp in the queue and then scrolled to that timeframe in the log file
3) Around 18:14:52 in the log file below there is a "Stopping working queue SMTP delivery queue.".
4) It seems as though a bunch of worker threads for delivery all terminate, and after that no more SMTPC lines and thus the queue started to grow
5) On Status-->Server I paused/unpaused then everything went back to normal. I started seeing SMTPC log entries at 19:28:56 which is when I pause/unpaused the server. This means the queue was growing for well over 1h.
6) At the time of reset, there were only 4 IMAP connections and nothing else (According to Status -->Status page)
7) Nothing in error log

What would cause the shutdown of SMTPC/Delivery threads?

Here is the log with Application/SMTP/TCPIP/DEBUG logging enabled. Note that I replaced our IP and hostname with XXXXX:

Is this mailserver used as a relay or as a spam appliance? DOes it have either of those in front of it?

I don't know what Eventhandlers.vbs is. Never changed it so it would be whatever is default with the install. I will get it and post it shortly if you need it.

This mail server is dedicated to hmail and as a dns server (Simple DNS Plus). That's it. Both under minimal load. The sql server installed on it is only for hmail.

It is a relay but not an open relay. It relays for a few Web apps and of course for the 20 or so customers using it. Also imap/pop for same users. Was hoping to transfer overmore domains but now we are worried. Everything was previously handled by a server 2003 running an old Merak mail version with 1GB ram and 2vCPUs so we thought this should handle way more with extra ram, new OS, and more vCPUs running hmail.

There is no other relay or spam appliance in front of it. Why do you ask?

Again load is not very high. Few thousand emails per day. Mostly spam.

I have both spam filters and surbl enabled.

Why would the delivery threads just exit all of a sudden? What would trigger that?

What do u suggest I bump the threads up to? But my concern is that's a bandaid fix. If it's a load issue then wouldn't history repeat itself when the load on this server increases? Or if there is ever a spike? I understood things to be more robust. Why do u think the threads just exit? It's all logged gracefully so something must be triggering it.... Any ideas?

This error indicates that the script didn't load, scripts are in eventhandler.vbs

By default the subs are all commented out.

That error only seems to happen when I restart the service. Since I didnt modify eventhandler.vbs can it be ignored?

mattg wrote:

bitman wrote:There is no other relay or spam appliance in front of it. Why do you ask?

just checking, sometimes these things can capture threads and hold them...

Along same thought process, what hardware make and models are in this setup, ie is there any Cisco routers or firewalls?

The firewall (sonicwall) infront of this server is the same physical firewall thats infront of the old server. In fact, they even use the same rulesets. Old server never had any issues. I'd be surprised if its a firewall related issue.

Can you please help me understand something. From my understanding of what I read in the log files, it seems as though the system gracefully exits the delivery worker threads. I also see seconds later requests for SMTPDeliveryManager to start delivery "Requesting SMTPDeliveryManager to start message delivery" but nothing happens for over 1h until I manually pause/resume.
1) Under what conditions does the system tell the delivery worker threads to exit? ""Stopping working queue SMTP delivery queue." and "Worker exited in work queue SMTP delivery queue"
2) What would cause the SMTPDeliveryManager not to start/spawn up new threads to start delivery when asked to do so "Requesting SMTPDeliveryManager to start message delivery"?

Maybe if we look at the conditions that trigger the shutdown of the delivery worker threads that might help understand if any of those conditions apply to the environment/state of the server at the time?

Really driving me nuts - I keep checking the server emailing myself making sure the queue is being processed... haha.

bitman wrote:Why would the delivery threads just exit all of a sudden? What would trigger that?

restarting the server is what I had assumed had caused that.

bitman wrote:What do u suggest I bump the threads up to? But my concern is that's a bandaid fix. If it's a load issue then wouldn't history repeat itself when the load on this server increases? Or if there is ever a spike? I understood things to be more robust. Why do u think the threads just exit? It's all logged gracefully so something must be triggering it.... Any ideas?

I think the thread count is too low and you are running out of threads because something in your system isn't releasing them correctly.
I figured that more threads should give you longer between re-boots, if I am correct, and then it will be a matter of tracking down what is 'holding' the threads.

Just 'cause I link to a page and say little else doesn't mean I am not being nice.
https://www.hmailserver.com/documentation

bitman wrote:Can you please help me understand something. From my understanding of what I read in the log files, it seems as though the system gracefully exits the delivery worker threads. I also see seconds later requests for SMTPDeliveryManager to start delivery "Requesting SMTPDeliveryManager to start message delivery" but nothing happens for over 1h until I manually pause/resume.
1) Under what conditions does the system tell the delivery worker threads to exit? ""Stopping working queue SMTP delivery queue." and "Worker exited in work queue SMTP delivery queue"
2) What would cause the SMTPDeliveryManager not to start/spawn up new threads to start delivery when asked to do so "Requesting SMTPDeliveryManager to start message delivery"?

Starting SMTPC and the not continuing is consistent with something doing packet inspection (firewall) , or perhaps DNS resolve issues.

This is definitely seems like something in your setup somewhere. hMailserver should handle that load very easily.

Just 'cause I link to a page and say little else doesn't mean I am not being nice.
https://www.hmailserver.com/documentation

bitman wrote:Why would the delivery threads just exit all of a sudden? What would trigger that?

restarting the server is what I had assumed had caused that.

Thats the thing - I wasn't on the server nor did I restart anything when the queue started growing. If you look at the log snipit that I posed around 18:14:52 you'll see for some reason the system just started telling the delivery worker threads to exit. This tells me the system chose to shut down the threads and thats why nothing was being delivered anymore? Maybe I'm not understanding things correctly. Need to understand why it would tell the threads to just shut down all of a sudden - certainly not realted to any actions we took.

mattg wrote:
I think the thread count is too low and you are running out of threads because something in your system isn't releasing them correctly.
I figured that more threads should give you longer between re-boots, if I am correct, and then it will be a matter of tracking down what is 'holding' the threads.

Fair enough, I can certainly increase it if the goal is to increase the time between failures. However, I'd be surprised if firewall this is the cause of the issue because the old mail server was running for years without issue behind the same firewall with the same rules. For my own edification, what in the log files leads you to believe something is 'holding' the threds? They seem to terminate just fine all on their own around 18:14:52 for no apparent reason?

mattg wrote:
Starting SMTPC and the not continuing is consistent with something doing packet inspection (firewall) , or perhaps DNS resolve issues.
This is definitely seems like something in your setup somewhere. hMailserver should handle that load very easily.

Do we know if the threads actually started/attempted to start? I only see a log entry for a start request "Requesting SMTPDeliveryManager to start message delivery". Again, maybe I'm misunderstanding the log entries.

What should I check on the firewall? If this happens again should I look for open connections from firewall to mailserver?

Just an update on this issue incase anybody else runs into it. After speaking with Martin and providing the necessary information he found a bug. Apparently when the queue is cleared then delivery stops working.

I had the same problem.
It is possible that someone deleted emails from the Data directory.

I resolved the problem as follows:
1 backup the database
2 backup the Data folder
3 run the DataDirectorySynchronizer.exe from hmailserver\Adons folder
4 Choose the option "Deletes message which are not in the database from disk"
5 restart the server (not sure if it is really necessary, but ...)