I am wondering if the "Discard emails for users who have exceeded their quota instead of keeping them in the queue." option in the WHM really works. Below is part of a message that I noticed that was in one of our server's mail queue. I've excluded some of the content, but left the message IDs so that they can be referenced later on:

Code:

1FLcIR-0001rz-GY-D
A message that you sent could not be delivered to one or more of its
recipients. This is a permanent error. The following address(es) failed:
username@server.tld
(generated from emailaccount@domain.tld)
retry timeout exceeded
------ This is a copy of the message, including all the headers. ------
.
.
.
Received: id 1FLcIN-0006Dv-9N
From: <invalidemail@anotherdomain.tld>

The ID of the message that is in the queue is 1FLcIR-0001rz-GY. The 1FLcIN-0006Dv-9N ID refers to a message that was originally sent to one of our users on our server. When the message was sent, this account was over its quota. When I look through the mail logs, first for 1FLcIN-0006Dv-9N, I see:

This seems to show that the 1FLcIN-0006Dv-9N message was rejected by the server that hosts anotherdomain.tld (the message probably did not originate from there), but remains on our server's queue.

I'm not doubting that this message is a spam message, and it was likely sent from an invalid e-mail account which is causing the return message to be rejected. My question is, why is our server sending the return message? Shouldn't the message just be rejected once it is determined that the username account is over its quota?

Maybe this is just the way this feature is suppose to work. I'm not really complaining, I'm just interested in understanding what's going on. I suppose the option is working, because the message to emailaccount@domain.tld is not staying in the queue, its the return message that is staying in the queue. I just really don't know why the return message is even being generated. I guess I thought this feature would work more like the ":fail:" option, in that if an account is over its quota, then our SMTP server would send a DENY back to the sending server, forcing the sending server to handle the return e-mail. I suppose the message has to be accepted before it can be determined that the account is over its quota.

Like I said, I'm not really complaining, just looking to see if there is some type of explanation for this or if there is anything else that can be done.

It can't send a deny because it needs to receive the message before it can tell how big it really is, at which stage (i.e. after DATA) it's too late to reject it in the SMTP protocol. That's why you get the overquota bounce in the queue for emails that cannot be returned. The reason they're in the queue is that the SMTP protocol persists in trying to find someone to deliver an error to. Unfortunately, this is almost always spam, since genuine senders would accept the return.

That's sort of what I figured. Would it be possible for exim to look up the quota of the account after the RCPT TO stage and if the account is at or over its quota, then deny the message? Even if it is possible, it would probably need to be something that is changed within the default main exim configuration. I'm just wondering if something like this would be feasible and if it had any drawbacks. It looks to me like it would prevent atleast some queue build up on the server.

Are there any other suggestions for dealing with these queued bounce messages? Would lowering the timeout_frozen_after setting affect this?

Again, not really complaining about any of this, just wondering if anyone else had noticed a build of these types of messages and if there was anything that could be done to help with this.

Just looked through the exim docs and it doesn't seem possible using inbuilt exim commands because all quota related commands are part of the appendfile transport which is one of the last things to happen in local mail delivery. It would almost certainly be possible by checking using a perl script in the ACL section, though.

I've just spent a couple of hours investigating this. I've tried implementing an ACL at the RCPT stage for quota checking but it won't work - at this stage, the exim process doesn't have the required privileges to ascertain the size of the users mailbox. So, even though I can establish the correct mailbox directory and the quota, I cannot establish the current size of the mailbox. Only during the appendfile transport can exim change its context so that it can establish the size of the users mailbox.

Shame really, as the perl script I wrote to do this works a treat, except, of course, it doesn't because it cannot compare the mailbox size to the quota

Hey chirpy, I really appreciate your efforts on this, I really have no idea when it comes to configuring exim and writing ACLs and I know that you seem to have a really good grasp of all of this.

I'm wondering if there is some confusion in terms of the quota that I was referring to. I was referring to the overall account's quota and not the individual mailbox quota. If an account's overall username is at or over its quota (whether it be e-mail, hosting files, stats, etc). Then would it be possible to reject incoming e-mail before the DATA command in the e-mail transaction?

This may be what you are referring to in your reponse, but you mention that you are unable to determine the size of the mailbox at this stage, so I'm not sure if you are referring to the overall quota or just the mailbox quota.

Again, I appreciate you looking into this. And it should be noted that there really isn't any urgency on this. I am content to leave the configuration the way it is now. I just really didn't know if this was an issue that other people had noticed and whether or not there was enough interest to either look for a solution or log an enhancement request with CPanel (again, if it is something that could be fixed).

Chirpy, I can write something to find the size of a mailbox from exim if that's useful? (yes, taking into account exim's lack of permission - a small well-tied-down setuid C program should do the job nicely). Let me know if you want to take this further!

I've been a bit busy working on a few things and I haven't been able to keep up with this thread that much (not that it would really matter that much, not sure if I can really contribute anything as writing Exim ACLs and Exim configuration is a bit above me). But I did just want to say that I appreciate everyone's input on this. I don't want anyone to feel like this is something that they have to work on or have to resolve. I do think, that if it can be resolved, then it would be beneficial for everyone, if everyone could use it.

Again, I appreciate the effort that everyone is taking on this. I know sometimes, you get so involved with a particular project that you just have to stay with it to see it through.

I have hacked together something to reject mail to over or at quota accounts at RCPT time. It's very simple and inelegant. A shell script, run from cron every 5 minutes, puts a list of domains that are at quota in the file /etc/exim_deny_quotalimit. Exim then reads that file in an ACL and rejects messages to those domains. No doubt there are much better ways of doing this.

In this case however, I have a small improvement - you need to anchor the match with $name or you'll match accounts with the same prefix, ie: southc and southcor would both be denied email if southc was full. Likewise any domain name on the server with "southc" in it will also be denied email.

This is very interesting. I have to say this isn't really how I had imagined a fix or solution to this problem, but this would certainly work.

To make another suggestion, have you considered creating a lock file within the shell script to prevent this script from being run twice. My reason being, suppose the script runs at 5 past the hour, but for some reason or another, it gets hung up and does not finish, then at 10 past the hour the process runs again, meaning that there are two of these processes running.

If you had it set up to create a lock file when it begins, then if it detects that lock file when it runs again, it stops. An even better solution might be for the script to store its PID in the lock file and if it detects the lock file, it kills the last process and starts again. This would seem to prevent any issues where the lock file is created and is never properly removed, meaning that the script would never run again.

If the lock file stuff was done within the script, then the additions to your crontab would not be necessary.

I don't know this is just a suggestion. I am not very good with shell scripting. I might could do this with perl, but shell scripting is probably better. If I get time I'll see if I can come up with a fix. Unless someone sees a reasons why this would not be a good idea.

Overall though, I must applaud your efforts, this looks very promising.

I did change this script around to do the locking mechanism within the script. Bash scripting really is not an area I am very strong in, so I'm not entirely sure if this is an efficient way or if it's not without problems, I thought I'd let you look at it and see what you think.

If any other user comes across this, I would proceed with caution with using this modified script, I would wait to see if it gets approval by nxds since he wrote the original script and knows more about shell scripting than I do.

This checks to see if /tmp/quotacheck.lock exists and if it does, it reads the contents of the file which contains the PID of the previous running quotacheck script and kills it. Then it creates a new lock file and performs the same quota check as was done before, finally ending by removing the lock file. If for some reason, the task of finding accounts over quota gets hung up or does not complete before the next 5 minute interval, that process is killed and a new process is begun.

I suppose another way to do this would be to check and see if the lock file is present and if it is, just exit and let the old process complete. However, the concern here is what if the /tmp/quotacheck.lock is never removed, even if the process completes (which really should not happen), then the script would never run again. I don't know, that may be a safer way to go.

I just thought I would post this and let nxds and other users review it and see what they think.

The problem with this strategy (killing the process every 5 minutes, if it's not yet completed) is that if it is taking longer than 5 minutes to rebuild the file, the file will never get rebuilt.

On the other hand, it should never take more than 5 minutes anyway. I'd also use the repquota command rather than running the quota command repeatedly, repquota is designed for looking at multiple user quotas. (ie repquota /home | grep -v -- --)

You might want to look into the lockfile command, which I think is standard. I think the strategy should read more like:

Code:

if the lockfile exists and the process still exists:
if the lockfile is less than 10 minutes old:
exit this time around.
else (more than 10 minutes old):
remove it and relock and re-run.

I think an ideal piece of code would use lockf(2) to avoid having to check to see whether the process still exists (lockf is a system call and does that internally).

If the lock file stuff was done within the script, then the additions to your crontab would not be necessary.

Click to expand...

Not quite. The crontab was changed because in the first example the shell truncates the output file before the rest of the script has finished running. If the script took 2 seconds then the file would be empty for 2 seconds. The second version ensured that the file would never be empty.

I take your point about preventing 2 copies of the script running, but in reality that would never happen. I also like Brian's idea of using of repquota because it is much faster. On the server I am testing this on, it reduces the execution time from aroound 1.5s to 0.25 seconds.

Below is another version of the script that exits if there's another copy running and uses repquota. (It will not kill a long running/hanging process, but I don't think that's a reasonable possibility either). Note that this lists at or over domains. If you want just over quota domains, change $3 >= $4 to $3 > $4 in the awk script. This version also takes an optional output file on the command line so the cronjob would look like this:

Another way to approach this would be to write a daemon process, much like antirelayd, that constantly ran in the background and periodically updated the /etc/exim_deny_quotalimit file. This would obviate the need of a cron job, but would add the worry of making sure the process never died unexpectedly. However, I don't really see the need to complicate what's really a fairly straightforward procedure.

Actually what I'd really like to do is to put a fake email into their maildir/new folder for each user in the domain, saying that the account is over quota. You'd need some way to make sure you only put one of these messages into each 'new' folder but this would be a real support time-saver - at the moment they ring and ask, "why isn't my incoming email working" or, worse, "why do messages take 4 hours to get to me?" (answer: because it took that long for you to delete enough messages so there was space for the new one) - D'oh.

The idea is that the message/file be directl created by root, and then chowned, to get around the problem of exim refusing to deliver to an account with no disk space.

You could easily identify the new message file by giving it a special name. That would give the ability to delete the magic "out of disk space" file if disk space returned. I'm not sure whether a special name would work or not with mail readers/POP but it's easy enough to test.