Use this forum if you have installed hMailServer and want to ask a question related to a production release of hMailServer. Before posting, please read the troubleshooting guide. A large part of all reported issues are already described in detail here.

Sorry, ghuys, I have spent hours trying to work this out by searching the forums andTRYING to understand the documentation about creating Regex expressions but it remains complete hieroglyphic to me. So forgive me for blatently (and himbly) requesting you help on this.

I am trying to create a rule using the REGEX test that simply says:

IF field ([contains WORDA] or [contains WORDB] or [contains WORDC]) AND ([contains STRING1] or [contains STRING2]) then...

eg
if subject (contains 'itune' or 'apple' or 'paypal') AND (contains 'suspend' or 'restrict')

itunes will terminate your account - ('itunes' exists but not 'suspend' or 'restrict')

Im sure its easy-peasy......if you can understand the foreign language that is REGEX. (Something to do with searches within 'group1' and then also a search within 'group2'. I have no idea and have tried many permutations and supposed online regex generators).

make life easy and use two conditions utilising the rules AND so you get

when creating rule select "use AND" (which is the default)

first condition regex
(?i:^.*(itune|apple|paypal).*$)

second condition regex ( which is a logical AND because you selected it above)
(?i:^.*(suspend|restrict).*$)

Note above is case insensitive and this isn't fully tested but appears to work in hmail rules setup. It may or may not work in javascript (probably not) or VBScript (possibly not but you can force lowercase before doing the regex).

much simpler than you thought (famous last words).

The problem with regex is that you have to know it all and just when you get to the point where think you do, you find out you don't and its back to square one.

Yes, I will have to do that for now. But its not ideal as I want to run the same test against both the SUBJECT and the BODY. If I know the proper regex way to acheive the 'both' test in one command then my rule could be
where BODY = [regex] OR where SUBJECT = [regex]. As the test now has to be where FIELD = REGEX1 AND REGEX2 I cant do the OR for the separate fields (therefore needing the 2 rules instead of one).

it won't come to light unless you do it yourself. There is no AND operator in regex and you are trying to use it like a programming language. It isn't, its a pattern matching tool and making it do what you want would make it very long winded to get all the logic into it.

That should do it. That way its testing for either order or the words and in either of the fields.

In case anyoine is interested in my formula (to take on this endless Palpal/Apple ID/iTunes phisihing spam malarky):

(?i:^.*(itune|apple|paypal|account).*(verif|update|udapte|froze|confirm|rectif|expir|informations|suspend|restrict|limit).*$)and the reverse:(?i:^.*(verif|update|udapte|froze|confirm|rectif|expir|informations|suspend|restrict|limit).*(itune|apple|paypal|account).*$)

Obviously this will be subject to modification of the terms as new spam terminlogies get sent out and may not suite everyones situation (I am confident it suits my account).

Thats what I meant by long winded. It can be done in one regex but it requires several ORs and swapping stuff round.
There are more commands which look ahead that effectively are like an AND but its gets completely unreadable if you have a memory like a sieve(me) and rarely use regex.

I just use spamassassin and auto update my rules once a day. Much easier and I don't have to keep modifying regex patterns. SA do it for you.

I also have spamassassin with daily updates but it doesnt make any impact on these kind of emails at all. They all seem to slip through as new sources and methods. That is why I am creating this rule. (Very occasionally, one might get recognised and scored high enough but maybe only 1 in 7 - the rest slip through.)

I didnt think that was too long-winded really, it was just a case of inserting the new word into the list and then copy the line 3 times.

Well, the first was that initially I couldn't get anything to work as I simply didn't understand how to start the query and secondly I wasn't aware that there was no AND operator. And actually it was your reply that helped me on my way. Now, with your help, I have my answer. Ta.

I was trying to make sense of the regex expression so was matching it to some online 'help' docs. I wondered about the starting ^ and the ending $ symbols. If I remove them and do a test match then the expression still seems to work. So I was wondering if i am misunderstanding something that you know better and whether I should leave them in as you stated? (I under the remaining parts of the expression although not sure of the formatting of the i: or the reliance of ?) Could you explain the ^ and $ please?

In many cases they aren't required but I always put them in for clarity and to remove uncertainty, especially when you come back and look at your regex 6 months later and have forgotten exacly how regex works or what your intention was at the time you wrote it.
And if you have an OR where you want to go back and look from start of line then ^ will be required and using $ makes it clear that this section of the regex is to end of line.

And bear in mind that there are different implementations of regex and how it works. Depends on the Regex implementation that was used for compiling hmailserver and hmailadmin as opposed to for example the implementation in PHP for webadmin. And unix and perl implemamtations may be different.
And that means you have to be very careful about where you reading up on regex usage because if its talking about unix or perl what you're reading may not apply to windows version of javscript for example or what hmail uses.

So beware of online javascript based testing tools, especially if you don't know whether the website is running on windows or unix and whther the page uses PHP or ASP or some other webscript language. Always test it on your target platform and software.

Ok, well for sure I understand the whole complications of ensuring I am looking at the correct version of regex! I have seen several conflicting opinions/versions of how to do things and i reckon this definitely added to my overall comfusion at the beginning.

Regarding the 2 symbols: ok, well without them it seems to work. But for clarity I will ensure they are included, and then in the future should I ever come to dabble in doing another regex expression and use this as a starting point, they may well be the difference to making my next one work or not.

Here you go, Percepts, one of the phishing emails (that I aim to be capturing with my regex formula) has come in and been caught. I post the headers here and you can see how it completely slips through spamassassin (the last header is a custom header added by my rule so I can see why it ends up in the trash - rule or user)

They have been flagged as spam but not for the definitive reason of it being a phishing email and that I set the score of 3 by choice; if I had set it higher, say 4, then the first one wouldnt have been caught. Note that is the 1st email had adopted the BODY image type of the 2nd email (ie just an image) then it wouldnt have been scored 1.5 TVD_PH_BODY_ACCOUNTS_PRE and the overall score would have been 2.1 ...and NOT flagged as spam.

The reason I set to flag as a WARNING spam rather than delete is because we still get GENUINE emails coming in from our suppliers that get scored between 3 and around 5.8 - and we cant have the risk of such genuine mails being deleted. I have a score of 8 set as the delete threshold. My regex formula, though, definitely catches those specific emails without leaving anything to chance. If there was a spamassassin formula that effectively replicated my regex formula (narrowing down the risk) then it would be more reliable and I could have increased its score on a match.

Of course you might say "well why dont you write one then, Jim?": the answer is simply this (pick one):
a, if I have to write a formula myself thats local to my system then isnt that what I have already just done in hmailserver?
b, I have NO IDEA on how to write such things and add them to spamassassin (and Im sure you are already fed up with me picking your brains already )

Last edited by jimimaseye on 2014-06-26 21:54, edited 1 time in total.

Well, I was going to say that it looks to technical for my lame brain (Im not good from starting with instructions and better picking up and learning from others). But then I glanced down the page and saw this as part of the Getting Sytarted section:

"Build a significant sample of both ham and spam. I suggest several thousand of each, placed in SPAM and HAM directories or mailboxes."

And with that, I got no chance! I dont have thousands, or even hundreds (barely 10's!) of spam. Our main spam nowadays is this phishing email that comes in to one specific account and is probably averaging 2 or 3 a day (the problem is the user is gullible enough to click one of the links which is probsably why only he gets these emails anyway). Mostly all other spam is EXTREMELY rare and is rightfully handled by the default spamassassin rules (viagra, russian babes etc).

one word of warning, if your other SA scores accumulate to less than zero it is possible it will still get through but unlikely. And note that you can add body tests into above as well. See references: http://wiki.apache.org/spamassassin/WritingRules

Thanks for that percepts. It seems we are both on a learning curve right now then.

For your info, the 'body' type check actually already includes the SUBJECT line (as I found out with my testing and then confirmed by that RULES wiki page). So I only need the one rule (applied to 'body') to catch either positionings. Here is the new rule:

By the way, (for readers generally), just to give you the complete picture, all my emails at the moment come in via External Download so Spamassassin only scores but doesnt delete/prevent delivery. The automated deletion of high-scoring emails is done with a global rule in hmailserver that looks for the X-Spam-Level header and checks to see if it has at least 7 asterix (*******). This way it deletes all spam scored higher than 7 (in fact my 'delete' actually only moves it directly to TRASH giving the ability for a last resort double check if needed). Therefore I can set the score of my custom rule to anything I want (I could set it to 100 if I wanted). I do have the hmailser Spam settings doing proper (unrecoverable) deletions if a HMAILSERVER score achieves 8 or higher (and a marking of SPAM on achieving a score of 5 - 5 is also the score given when spamassassin marks an email, with 3, 2 & 2 also being awarded for SPF, HELO and DNS-MX record checks respectively). I choose this method rather than simply using the spamassassin score because sometimes spamassassin scores are not so consistent (same email checked twice can come back with different scores for some reason.

So to summarise my Spam check settings in Hmailserver are

Spam Delete Threshold = 8
SPF = 3
HELO = 2
DNS-MX = 2

Use Spamassasin = Yes
Use SA score= NO
Score = 5

and I have Spamassassin marking mail as spam with a threshold score of 3.

Then a global rule that checks custom header X-Spam-Level header for ******* (7 asterix) and moving to 'Trash'.

So,

if SA marks as spam, then hmail scores it a 5.
If hmail scores and adds its own checks to the score and it exceeds 8, it gets deleted, otherwise just marked '[SPAM]'.
else
if the hmail global rule sees SA has scored it higher than 7 then it gets Trashed (moved to Trash - as will be the case with the custom rule discussed earlier currently being marked as 8 ).
else
the message stays in the Inbox (with a subject appended as [SPAM])

- I also have spassassin actually tagging the subject with the spamassassin score. So any emails marked as spam but not deleted will look like:
"[SPAM] [3.5] Here is your invoice/original subject..."

By doing this it safe guards against against rogue spamassassin rule scores that might cause a direct delete (if I had used "Use SA score = yes") without being able to double check. But of course, if SA has scored it, AND hmail scores it with its own checks as well (exceeding 8 ), then there is less doubt and therefore safer to wipeout without fear.

It may sound complicated, but its not really. And it works really quite well. (2350 emails, have only 35 retained as genuine falsely marked as [SPAM] (above a spamassassin score of 3), of which 19 of them are from 1 supplier who always scores 5.8 for some reason and the other 16 scoring 3.x something.)

Indeed, but its not really. All I have done above is documented the settings I have as offered by hmailserver and spamassassin. I guess the explanation WHY I have these settings may lead to a thought of confusion or seem complicated but I have simply set the 6 settings accordingly (finely balanced), and a rule to, capture all 3 eventualities (not spam, potentially spam, or 'defo dont want to see this muck' spam).

Ok its been some weeks now and I have been tweaking and tailoring the regex expression to maximise effect and minimise false positives.

For readers that may be interested, the following is the current (and so far proved optimum) version of my custom spamassassin rule (entered in my 'local.cf' file) that catches all them annoying 'paypal/apple/itunes - "account been frozen etc" phishing emails.

(Note: my rule scores it as 7.0 as this satisfies my personal spam catching setup which places all 7 or above directly into TRASH bin of the account) but users may wish to change the score to match their current scoring system. Oh, and of course, you can change the 'description to read whatever you want (I left it verbatim for manual checking of emails). Also, with this regex setup I have only had to whitelist 1 genuine address/sender out of all our senders to ensure it doesnt get accidentally trapped (from a hotel booking site when they send booking confirmations out because they quote the words 'account' and 'confirm' in their emails).

For the record, 90% of the credit for this goes to you for your help - I merely tweaked the phrases being searched and chose method of checking the emails.

I still stand bemused by the seemingly random 'flavours' of regex out there and trying to find the correct one that spamassassin uses - nevermind actually understanding the common 'language' of it. I still have a couple of pages open to refer to and often find my experience not matching what the help pages suggest. Sometimes I have to refer to existing rules in spamassassin to see if I can make any sense of them and then copy the style.

And all this, over and over again for the last 3 weeks.....just for the quoted rule. THAT is how expert I am!!

I have listed a cutdown version of the HTML-based email with the hooky word in it so you can save it and launch it to see the code yourself and evaluated spam headers yourself (all personal identifiable info changed.) Just copy and paste the code to notepad, and save as .EML.

You could also run a test on it yourself if you feel that way inclined. )

Wow! Someone remind me never to come to this forum to ask for help. This guy seems to have a serious sense of humour failure!

I always use forums for finding help from others who have knowledge and willing to help others. This percepts guy sounds like he shouldnt be here. If jimimaseye had the answer or wanted to pay someone local he wouldnt be asking here.

I didnt sign up to be rude. Maybe my comment should have been seen as a heads up on how the last comment didnt fit what I thought forums were about. But thats ok, attack me, the newbie as being rude for pointing out someone elses rudeness.

I just read the thread from beginning and it seems very clear that jimiseye question was valid and reasonable. It seems he didnt know the answer as that issue hadnt been discussed.

It has obviously escaped you that this is an hmailserver support forum where we help, for the most part, people to get their hmail installation up and running when they are exeperiencing problems with it or resolving security loopholes etc. Users are expected to have a good deal of technical savvy if they are taking on being a mailserver administrator.

What it is NOT, is a prgramming tutorial site. It is not a Smapassssin Tutorial site. That is someone elses software. We help a little with getting SA installed but once the person wants constant help with tailoring someone elses software I draw the line. Goto the support forum for that software and not here. And this is not a Regex/perl tutorial or support forum either.

So why are you here? No don't bother to answer that. People here are busy and don't have time to waste on people who make posts which aren't hmail support questions.

Well I am shocked. I have to agree that Percepts has totally taken this somewhere where I didnt intend it to go. The winking eye at the end should have given a clue that I was being very tongue-in-cheek about 'running a test yourself'. But that said, its not unusual for people to do that themselves anyway without anyone asking them. The reason I gave the code was so that ANYONE, not only Percepts, can see the code I was dealing with and see why the suggesting was failing. And as Turtleneck says of course I wouldnt have been asking if I was able to do it. And if all forum responses were "test it yourself or go pay someone" then there would be no point to having this forum, would there.

Turtleneck: Percepts is/has been in the past very helpful. I do see why you said what you said but to be sure and avoid doubt he can be quite helpful. I would recommend to you using this forum anyway for your questions and queries as I have had a lot of help in the past (and there are many other users including Matt the moderator) who normally offer help one way or another (and dont ask to be paid for it. )

So, my query still stands: anyoe know how to compile a regex query that is able to catch that sort of character set/encoding as detailed in my report above? (I dont even know what the terminology is for such a thing. Is it some encoding, or html character set, I dont know). So I would be grateful of anyone's help.

p.s I would p1ss myself now if Turtleneck provides the answer.

EDIT: Just read your post Percepts. You didnt say that when the initial thread started entitled "Another Regular Expression Question" did you! Plus here is a 'new trick' for you to learn: find more polite ways of telling people you are no longer willing to help. Please.

1, its a RAWBODY check (as opposed to BODY check) that is needed.
2, the opening ^.* and closing .*$ that I was originally using (from Percepts original suggestion early on in the thread) was causing the problem. Only when it was removed did this then get picked up.

Note: of course dont forget the \ for escaping the hash (otherwise it is classed as a start of a comment)

1, its a RAWBODY check (as opposed to BODY check) that is needed.
2, the opening ^.* and closing .*$ that I was originally using (from Percepts original suggestion early on in the thread) was causing the problem. Only when it was removed did this then get picked up.

Note: of course dont forget the \ for escaping the hash (otherwise it is classed as a start of a comment)

Thanks to all involved (when they did).

I told you in the thread earlier that .* wast't "required" so don't try and cover your ignorance by blaming me. Furthermore you have broken forum rules by adding a new question onto the end of the last. And lastly I made precisely zero refernece to .* for this last question but again in your ignorance you have tried to apply something from an earlier question and got it wrong. If you were so smart you wouldn't be here asking questions. .* is NOT the reason for whatever you used being the fault. It is not possible for it to be wrong because of what its presence means. It is your own inability to get it right which was the problem.

Wow. Are you in a bad mood or something or is this normal? You really dont want to let this go?? Despite DooM's suggestion?

No one was "trying to blame" you and I dont think anyone would have read it as that. Your name was mention as a reference to earlier postings in the conversation with me. But you seem hell-bent on responding with a non-response and making ill feeling about this instead of taking it for what it is. And if you want to attack me for what I write, well ok, lets do it your way...

Regarding the $ and ^ :

percepts wrote:In many cases they aren't required but I always put them in for clarity and to remove uncertainty,...

Yep. That's well and truly saying they arent required. You definitely insisted not to bother with them there .....apart from the few cases remaining when 'many' (leaving 'some' as it isnt ALL) didnt apply....and, err... "always". Oh look, you just ALL.)

percepts wrote:And lastly I made precisely zero refernece to .* for this last question

I never said .* was the problem either, I said "the opening ^.* and closing .*$." Dont quote me on something that was never there to be quoted (especially given youre doing such a bad job of reading your own quotes above).

percepts wrote:.....in your ignorance you have tried to apply something from an earlier question and got it wrong. If you were so smart you wouldn't be here asking questions. .* is NOT the reason for whatever you used being the fault..... It is your own inability to get it right which was the problem.

Well yeah. Its my inability to get it right which leads me TO ASK FOR CORRECTION! So shoot me for using a forum. (How dare I?!).

percepts wrote:to expand it a little you look for

i(&#932;unes|tunes)

following my initial statement that "At the moment I have (for the standard word): ^.*itunes.*$/i".
Again, thats you NOT saying "remove .*", and instead simply referring to the inner content around the itunes word just as I had asked for. And THAT implies the rest of the quoted formula doesnt need correction (ergo is correct). You didnt say anythoing about changing or ensuring 'rawbody' as part of the expression either. And that is why I mentioned it in my summary.

percepts wrote: in your ignorance you have tried to apply something from an earlier question and got it wrong. If you were so smart you wouldn't be here asking questions.

Youre absolutely right on this one. (Yey!!!) I have 'applied something from an earlier question", about REGEX formulas, on a thread entitled "Another Regular Expression Question", ...answered by you, ... in my "ignorance" because I dont know the answer, ....and yes if I WERE so smart knowing the answers I wouldnt be here asking the questions. Would I?!! So it seems your logic is:
"someone comes on asking a question about something they have already written and already know. How dare they think they know anything?! and that seems just bang out of order so Im not going to correct them and just chastise them for thinking they know everything...(despite them coming on the forum to ask for correction to the logic they have quoted)"

Oh, and whilst we are at pointing out your involvement which you were so very keen to raise and deny, furthermore, at the end I also said "Thanks to all involved (when they did)" which included YOU as a contributor to the early solutions of which my latest question was based around. For me, I had no problems with you (other than you obvious lack of sense of humour and misreading of a situation on when to reply and when not to reply, like now), and was/AM appreciative of your contributions to the solutions to my problems. But somehow, you seem keen on making me change my opinion and continue YOUR bad-tempered aggression. I dont know why. Is it because you feel an authority due to your contribution history and that makes you exempt from politeness? Its a shame. Your answers and help are sufficient to get respect. Your terse remarks and confrontations serve to negate it.

Im done. I will post here no more. I hate this. And if you wanted the 'get one over him' feeling, then purely for the fact I have felt I needed to respond to you in this manner means that youve got it. Well done, old dog.

jimimaseye wrote:Turtleneck: Percepts is/has been in the past very helpful.

but then...

percepts wrote:It is your own inability to get it right which was the problem.

Yes, thats the kind of help I need. Not.

Dont worry jimiseye I havent been put off. I have been looking and picking up tips around the forum and may still post questions if needed. It seems a helpful place for knowledge generally. I hope that whoever answers me will be more helpful.

And sorry, I didnt know the answer to your question. Shame though. (I do actually have spamassassin rule questions myself - but darent ask them here now). Good luck