Over the years I have slowly developed a regular expression that validates MOST email addresses correctly, assuming they don't use an IP address as the server part.

I use it in several PHP programs, and it works most of the time. However, from time to time I get contacted by someone that is having trouble with a site that uses it, and I end up having to make some adjustment (most recently I realized that I wasn't allowing 4-character TLDs).

What is the best regular expression you have or have seen for validating emails?

I've seen several solutions that use functions that use several shorter expressions, but I'd rather have one long complex expression in a simple function instead of several short expression in a more complex function.

I don't want to create a separate answer for that, but I would say that the only reasonable way to validate an email address in practice is to check whether it has the '@' in it. There's simply no reason to go further than that. The address might be valid but non-existent, and for that no regex can check; a non-existent address is no better than an invalid address.
– bazzilicAug 21 '15 at 10:52

The regex that can validate that an IDNA is correctly formatted does not fit in stackexchange. (the rules on canonicalisation ate really tortuous and particularly ill-suited to regex processing)
– JasenAug 29 '17 at 23:51

70 Answers
70

The fully RFC 822 compliant regex is inefficient and obscure because of its length. Fortunately, RFC 822 was superseded twice and the current specification for email addresses is RFC 5322. RFC 5322 leads to a regex that can be understood if studied for a few minutes and is efficient enough for actual use.

One RFC 5322 compliant regex can be found at the top of the page at http://emailregex.com/ but uses the IP address pattern that is floating around the internet with a bug that allows 00 for any of the unsigned byte decimal values in a dot-delimited address, which is illegal. The rest of it appears to be consistent with the RFC 5322 grammar and passes several tests using grep -Po, including cases domain names, IP addresses, bad ones, and account names with and without quotes.

Correcting the 00 bug in the IP pattern, we obtain a working and fairly fast regex. (Scrape the rendered version, not the markdown, for actual code.)

The more sophisticated patterns in Perl and PCRE (regex library used e.g. in PHP) can correctly parse RFC 5322 without a hitch. Python and C# can do that too, but they use a different syntax from those first two. However, if you are forced to use one of the many less powerful pattern-matching languages, then it’s best to use a real parser.

It's also important to understand that validating it per the RFC tells you absolutely nothing about whether that address actually exists at the supplied domain, or whether the person entering the address is its true owner. People sign others up to mailing lists this way all the time. Fixing that requires a fancier kind of validation that involves sending that address a message that includes a confirmation token meant to be entered on the same web page as was the address.

Confirmation tokens are the only way to know you got the address of the person entering it. This is why most mailing lists now use that mechanism to confirm sign-ups. After all, anybody can put down president@whitehouse.gov, and that will even parse as legal, but it isn't likely to be the person at the other end.

There is some danger that common usage and widespread sloppy coding will establish a de facto standard for e-mail addresses that is more restrictive than the recorded formal standard.

That is no better than all the other non-RFC patterns. It isn’t even smart enough to handle even RFC 822, let alone RFC 5322. This one, however, is.

If you want to get fancy and pedantic, implement a complete state engine. A regular expression can only act as a rudimentary filter. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective. A state engine for the purpose can both validate and even correct e-mail addresses that would otherwise be considered invalid as it disassembles the e-mail address according to each RFC. This allows for a potentially more pleasing experience, like

The specified e-mail address 'myemail@address,com' is invalid. Did you mean 'myemail@address.com'?

You said "There is no good regular expression." Is this general or specific to e-mail address validation?
– TomalakOct 14 '08 at 14:33

34

@Tomalak: only for email addresses. As bortzmeyer said, the RFC is extremely complicated
– LukOct 14 '08 at 16:23

35

The linux journal article you mention is factually wrong in several respects. In particular Lovell clearly hasn't read the errata to RFC3696 and repeats some of the errors in the published version of the RFC. More here: dominicsayers.com/isemail
– Dominic SayersApr 8 '09 at 15:56

SLaks: I'm really sorry. It seems that what looked like it was going to be a very useful entry was ruined. I'm not sure what happened, but I suspect you might have walked away, and your cat slept on your keyboard. You might want to edit it when you get a chance. Don't worry, it happens to the best of us.
– Matt SimmonsDec 15 '09 at 3:33

22

You'll find that the MailAddress class in .NET 4.0 is far better at validating email addresses than in previous versions. I made some significant improvements to it.
– Jeff TuckerDec 15 '09 at 9:56

This question is asked a lot, but I think you should step back and ask yourself why you want to validate email adresses syntactically? What is the benefit really?

It will not catch common typos.

It does not prevent people from entering invalid or made-up email addresses, or entering someone else's address.

If you want to validate that an email is correct, you have no choice than to send an confirmation email and have the user reply to that. In many cases you will have to send a confirmation mail anyway for security reasons or for ethical reasons (so you cannot e.g. sign someone up to a service against their will).

It might be worth checking that they entered something@something into the field in a client side validation just to catch simple mistakes - but in general you are right.
– Martin BeckettAug 25 '09 at 16:25

5

Martin, I gave you a +1, only to later read that foobar@dk is a valid email. It wouldn't be pretty, but if you want to be both RFC compliant AND use common sense, you should detect cases such as this and ask the user to confirm that is is correct.
– philfreoDec 16 '09 at 0:31

92

@olavk: if someone enters a typo (eg: me@hotmail), they're obviously not going to get your confirmation email, and then where are they? They're not on your site any more and they're wondering why they couldn't sign up. Actually no they're not - they've completely forgotten about you. However, if you could just do a basic sanity check with a regex while they're still with you, then they can catch that error straight away and you've got a happy user.
– nickfJun 2 '10 at 13:53

5

@JacquesB: You make an excellent point. Just because it passes muster per the RFC doesn’t mean it is really that user’s address. Otherwise all those president@whitehouse.gov addresses indicate a very netbusy commander-in-chief. :)
– tchristNov 7 '10 at 20:09

31

It doesn't have to be black or white. If the e-mail looks wrong, let the user know that. If the user still wants to proceed, let him. Don't force the user to conform to your regex, rather, use regex as a tool to help the user know that there might be a mistake.
– ninjaneerFeb 18 '14 at 2:56

It doesn't match all addresses, some must be transformed first. From the link: "This regular expression will only validate addresses that have had any comments stripped and replaced with whitespace (this is done by the module)."
– Chas. OwensApr 6 '09 at 0:18

43

Can you give me an example of some email address that wrongly passes through the second one, but is caught by the longer regex?
– LazerMay 15 '10 at 18:32

4

Much though I did once love it, that’s an RFC 822 validator, not an RFC 5322 one.
– tchristNov 7 '10 at 20:17

20

@Lazer in..valid@example.com would be a simple example. You aren't allowed to have two consecutive unquoted dots in the local-part.
– Randal SchwartzDec 6 '11 at 18:04

5

@Mikhail perl but you shouldn't actually use it.
– Good PersonJan 8 '13 at 18:48

It all depends on how accurate you want to be. For my purposes, where I'm just trying to keep out things like bob @ aol.com (spaces in emails) or steve (no domain at all) or mary@aolcom (no period before .com), I use

/^\S+@\S+\.\S+$/

Sure, it will match things that aren't valid email addresses, but it's a matter of playing the 90/10 rule.

JJJ: Yes, it will match a lot of crap. It will match &$*#$(@$0(%))$#.)&*)(*$, too. For me, I'm more concerned with catching the odd fumble-finger typo like mary@aolcom than I am complete garbage. YMMV.
– Andy LesterOct 16 '12 at 16:03

[UPDATED] I've collated everything I know about email address validation here: http://isemail.info, which now not only validates but also diagnoses problems with email addresses. I agree with many of the comments here that validation is only part of the answer; see my essay at http://isemail.info/about.

is_email() remains, as far as I know, the only validator that will tell you definitively whether a given string is a valid email address or not. I've upload a new version at http://isemail.info/

I collated test cases from Cal Henderson, Dave Child, Phil Haack, Doug Lovell, RFC5322 and RFC 3696. 275 test addresses in all. I ran all these tests against all the free validators I could find.

I'll try to keep this page up-to-date as people enhance their validators. Thanks to Cal, Michael, Dave, Paul and Phil for their help and co-operation in compiling these tests and constructive criticism of my own validator.

People should be aware of the errata against RFC 3696 in particular. Three of the canonical examples are in fact invalid addresses. And the maximum length of an address is 254 or 256 characters, not 320.

Hi @Josef. You should try to validate name@xn--4ca9at.at since this code is about validation, not interpretation. If you'd like to add a punycode translator then I'm happy to accept a pull request at github.com/dominicsayers/isemail
– Dominic SayersApr 27 '15 at 18:19

A valid e-mail address is a string that matches the ABNF production […].

Note: This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the “@” character), too vague (after the “@” character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.

This is interesting. It's a violation of RFC, but a willful one and it makes sesne. Real world example: gmail ignores dots in the part before @, so if your email is test@gmail.com you can send emails to test.@gmail.com or test....@gmail.com, both of those addresses are invalid according to RFC, but valid in real world.
– valentinasJan 16 '13 at 5:04

I think last part should be '+' instead of '*': ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)+$
– mmmmmmJan 21 '13 at 12:12

4

@mmmmmm john.doe@localhost is valid. For sure, in a real world application (i.e. a community), I'd like your suggest to replace * by +
– rabuddeFeb 1 '13 at 10:03

@valentinas Actually, the RFC does not preclude these local parts, but they have to be quoted. "test...."@gmail.com is perfectly valid according to the RFC and semantically equivalent to test....@gmail.com.
– RinkeNov 17 '14 at 9:01

I think that only a subset of the addrspec part is really relevant to the question. Accepting more than that and forwarding it though some other part of the system that is not ready to accept full RFC5822 addresses is like shooting is your own foot.
– dolmenDec 17 '11 at 13:53

3

Great (+1) but technically it's not a regex of course... (which would be impossible since the grammar is not regular).
– RinkeJan 3 '13 at 21:41

9

regexes stopped being regular some time ago. It is a valid Perl 'regex' though!
– rjhMar 10 '14 at 15:00

4

I set up a test for this regex on IDEone: ideone.com/2XFecH However, it doesn't fair "perfectly." Would anyone care to chime in? Am I missing something?
– MikeJul 30 '14 at 17:56

According to this page data.iana.org/TLD/tlds-alpha-by-domain.txt there is no domains with just a single character in top level e.g. "something.c", "something.a", here is version that support at least 2 characters: "something.pl", "something.us": ^\\w+([-+.']\\w+)*@\\w+([-.]\\w+)*\\.\\w{2,}([-.]\\w+)*$
– Tomasz SzulcNov 19 '15 at 12:53

3

@Wayne Whitty. You have hit upon the primary issue of whether to cater for the vast majority of addresses, or ALL, including ones that nobody would use, except to test email validation.
– PatanjaliNov 28 '15 at 3:13

@TomaszSzulc extra back slash in your answer is confusing, I just corrected it and 2 chars domains names support is working, ^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w{2,}([-.]\w+)*$
– Aqib MumtazNov 30 '15 at 11:16

The email addresses I want to validate are going to be used by an ASP.NET web application using the System.Net.Mail namespace to send emails to a list of people. So, rather than using some very complex regular expression, I just try to create a MailAddress instance from the address. The MailAddress construtor will throw an exception if the address is not formed properly. This way, I know I can at least get the email out of the door. Of course this is server-side validation but at a minimum you need that anyway.

A good point. Even if this server validation rejects some valid address then it is not a problem since you will not be able to send to this address using this particular server technology anyway. Or you can try doing the same things using any third party emailing library you use instead of the default tools.
– UserJun 16 '09 at 10:59

I really like how this leverages .Net framework code - no sense in reinventing the wheel. This is excellent. Simple, clean, and assures you can actually send the email. Great work.
– Cory HouseAug 15 '10 at 19:43

... yes and for the those interested in how it validates have a look at the code in Reflector - there's quite a bit of it - and it ain't a regular expression!
– Tom CarterSep 17 '10 at 8:07

2

Just a note: the MailAddress class doesn't match RFC5322, if you just want to use it for validation (and not sending as well, in which case it's a moot point as mentioned above). See: stackoverflow.com/questions/6023589/…
– porgesMay 31 '11 at 5:06

have a local part (i.e. the part before the @-sign) that is strictly compliant with RFC 5321/5322,

have a domain part (i.e. the part after the @-sign) that is a host name with at least two labels, each of which is at most 63 characters long.

The second constraint is a restriction on RFC 5321/5322.

Elaborate answer

Using a regular expression that recognizes email addresses could be useful in various situations: for example to scan for email addresses in a document, to validate user input, or as an integrity constraint on a data repository.

It should however be noted that if you want to find out if the address actually refers to an existing mailbox, there's no substitute for sending a message to the address. If you only want to check if an address is grammatically correct then you could use a regular expression, but note that ""@[] is a grammatically correct email address that certainly doesn't refer to an existing mailbox.

The syntax of email addresses has been defined in various RFCs, most notably RFC 822 and RFC 5322. RFC 822 should be seen as the "original" standard and RFC 5322 as the latest standard. The syntax defined in RFC 822 is the most lenient and subsequent standards have restricted the syntax further and further, where newer systems or services should recognize obsolete syntax, but never produce it.

In this answer I’ll take “email address” to mean addr-spec as defined in the RFCs (i.e. jdoe@example.org, but not "John Doe"<jdoe@example.org>, nor some-group:jdoe@example.org,mrx@exampel.org;).

There's one problem with translating the RFC syntaxes into regexes: the syntaxes are not regular! This is because they allow for optional comments in email addresses that can be infinitely nested, while infinite nesting can't be described by a regular expression. To scan for or validate addresses containing comments you need a parser or more powerful expressions. (Note that languages like Perl have constructs to describe context free grammars in a regex-like way.) In this answer I'll disregard comments and only consider proper regular expressions.

The RFCs define syntaxes for email messages, not for email addresses as such. Addresses may appear in various header fields and this is where they are primarily defined. When they appear in header fields addresses may contain (between lexical tokens) whitespace, comments and even linebreaks. Semantically this has no significance however. By removing this whitespace, etc. from an address you get a semantically equivalent canonical representation. Thus, the canonical representation of first. last (comment) @ [3.5.7.9] is first.last@[3.5.7.9].

Different syntaxes should be used for different purposes. If you want to scan for email addresses in a (possibly very old) document it may be a good idea to use the syntax as defined in RFC 822. On the other hand, if you want to validate user input you may want to use the syntax as defined in RFC 5322, probably only accepting canonical representations. You should decide which syntax applies to your specific case.

I use POSIX "extended" regular expressions in this answer, assuming an ASCII compatible character set.

RFC 822

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I'll try to fix the expression as soon as possible.

I believe it's fully complient with RFC 822 including the errata. It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. Where an erratum has been published I give a separate expression for the corrected grammar rule (marked "erratum") and use the updated version as a subexpression in subsequent regular expressions.

As stated in paragraph 3.1.4. of RFC 822 optional linear white space may be inserted between lexical tokens. Where applicable I've expanded the expressions to accommodate this rule and marked the result with "opt-lwsp".

RFC 5322

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I'll try to fix the expression as soon as possible.

I believe it's fully complient with RFC 5322 including the errata. It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. For rules that include semantically irrelevant (folding) whitespace, I give a separate regex marked "(normalized)" that doesn't accept this whitespace.

I ignored all the "obs-" rules from the RFC. This means that the regexes only match email addresses that are strictly RFC 5322 compliant. If you have to match "old" addresses (as the looser grammar including the "obs-" rules does), you can use one of the RFC 822 regexes from the previous paragraph.

Note that some sources (notably w3c) claim that RFC 5322 is too strict on the local part (i.e. the part before the @-sign). This is because "..", "a..b" and "a." are not valid dot-atoms, while they may be used as mailbox names. The RFC, however, does allow for local parts like these, except that they have to be quoted. So instead of a..b@example.net you should write "a..b"@example.net, which is semantically equivalent.

Further restrictions

SMTP (as defined in RFC 5321) further restricts the set of valid email addresses (or actually: mailbox names). It seems reasonable to impose this stricter grammar, so that the matched email address can actually be used to send an email.

RFC 5321 basically leaves alone the "local" part (i.e. the part before the @-sign), but is stricter on the domain part (i.e. the part after the @-sign). It allows only host names in place of dot-atoms and address literals in place of domain literals.

The grammar presented in RFC 5321 is too lenient when it comes to both host names and IP addresses. I took the liberty of "correcting" the rules in question, using this draft and RFC 1034 as guidelines. Here's the resulting regex.

Note that depending on the use case you may not want to allow for a "General-address-literal" in your regex. Also note that I used a negative lookahead (?!IPv6:) in the final regex to prevent the "General-address-literal" part to match malformed IPv6 addresses. Some regex processors don't support negative lookahead. Remove the substring |(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ from the regex if you want to take the whole "General-address-literal" part out.

User input validation

A common use case is user input validation, for example on an html form. In that case it's usually reasonable to preclude address-literals and to require at least two labels in the hostname. Taking the improved RFC 5321 regex from the previous section as a basis, the resulting expression would be:

I do not recommend restricting the local part further, e.g. by precluding quoted strings, since we don't know what kind of mailbox names some hosts allow (like "a..b"@example.net or even "a b"@example.net).

I also do not recommend explicitly validating against a list of literal top-level domains or even imposing length-constraints (remember how ".museum" invalidated [a-z]{2,4}), but if you must:

Make sure to keep your regex up-to-date if you decide to go down the path of explicit top-level domain validation.

Further considerations

When only accepting host names in the domain part (after the @-sign), the regexes above accept only labels with at most 63 characters, as they should. However, they don't enforce the fact that the entire host name must be at most 253 characters long (including the dots). Although this constraint is strictly speaking still regular, it's not feasible to make a regex that incorporates this rule.

Another consideration, especially when using the regexes for input validation, is feedback to the user. If a user enters an incorrect address, it would be nice to give a little more feedback than a simple "syntactically incorrect address". With "vanilla" regexes this is not possible.

These two considerations could be addressed by parsing the address. The extra length constraint on host names could in some cases also be addressed by using an extra regex that checks it, and matching the address against both expressions.

None of the regexes in this answer are optimized for performance. If performance is an issue, you should see if (and how) the regex of your choice can be optimized.

According to wikipedia seems that the local part, when dotted, has a limitation of 64 chars per part, and also the RFC 5322 refers to the dotted local part to be interpretted with the restrictions of the domains. For example arbitrary-long-email-address-should-be-invalid-arbitrary-long-email-address-should-be-invalid.and-the-second-group-also-should-not-be-so-long-and-the-second-group-also-should-not-be-so-long@example.com should not validate. I suggest changing the "+" signs in the first group (name before the optional dot) and in the second group (name after the following dots) to {1,64}
– Xavi MonteroMay 22 '17 at 0:35

As the comments are limited in size, here is the resulting regex I plan to use, which is the one at the beginning of this answer, plus limitting the size in the local part, plus adding a back-slash prior to the "/" symbol as required by PHP and also in regex101.com: In PHP I use: $emailRegex = '/^([-!#-\'*+\/-9=?A-Z^-~]{1,64}(\.[-!#-\'*+\/-9=?A-Z^-~]{1,64})*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+$/';
– Xavi MonteroMay 22 '17 at 0:39

CAUTION: For some reason, StackOverflow adds hidden characters when copying from the rendered markdown. Copy it into the regex101.com and you'll see black dots there. You have to remove them and correct the string... Maybe if integrated in the answer, there they are correctly copiable. Sorry for the inconvenience. I don't want to add a new answer as this one is the proper one. Also I don't want to directly edit unless the community thinks this should be integrated into it.
– Xavi MonteroMay 22 '17 at 0:48

@XaviMontero Thaks for contributing Xavi! Do you have a reference to the RFC stating the 64 character limit on local part labels? If so, I would gladly adjust the answer.
– RinkeMay 22 '17 at 11:21

There are plenty examples of this out on the net (and I think even one that fully validates the RFC - but it's tens/hundreds of lines long if memory serves). People tend to get carried away validating this sort of thing. Why not just check it has an @ and at least one . and meets some simple minimum length. It's trivial to enter a fake email and still match any valid regex anyway. I would guess that false positives are better than false negatives.

While deciding which characters are allowed, please remember your apostrophed and hyphenated friends. I have no control over the fact that my company generates my email address using my name from the HR system. That includes the apostrophe in my last name. I can't tell you how many times I have been blocked from interacting with a website by the fact that my email address is "invalid".

This is a super common problem in programs that make unwarranted assumptions about what is and is not allowed in a person’s name. One should make no such assumptions, just accept any character that relevant RFC(s) say one must.
– tchristNov 7 '10 at 20:22

gets a vote up, exactly what I was going to say. Doesn't handle IDN's but converting to puny code beforehand solves this. PHP>=5.3 has idn_to_ascii() for this. One of the best and easiest ways for validating an email.
– TaylorJan 25 '12 at 23:00

In short, don't expect a single, usable regex to do a proper job. And the best regex will validate the syntax, not the validity of an e-mail (jhohn@example.com is correct but it will probably bounce...).

Correct me if I’m wrong, but I believe that PHP uses PCRE patterns. If so, you should be able to craft something similar to Abigail’s RFC 5322 pattern.
– tchristNov 7 '10 at 20:24

@tchrist: not sure if PCRE has caught up to this syntax (which I discover). If so, not sure if PHP's PCRE has caught up to this version of PCRE... Well, if I understand correctly this syntax, you can as well use a PEG parser, much clearer and complete than a regex anyway.
– PhiLhoNov 10 '10 at 14:51

PCRE has caught up to it, but perhaps PHP has not caught up with PCRE. ☹
– tchristNov 10 '10 at 15:09

One simple regular expression which would at least not reject any valid email address would be checking for something, followed by an @ sign and then something followed by a period and at least 2 somethings. It won't reject anything, but after reviewing the spec I can't find any email that would be valid and rejected.

This is what I was looking for. Not very restrictive, but makes sure there is only 1 @ (as we're parsing a list and want to make sure there are no missing commas). FYI, you can have an @ on the left if it's in quotes: Valid_email_addresses, but it's pretty fringe.
– JoshNov 11 '11 at 6:16

2

After using it, realized it didn't work exactly. /^[^@]+@[^@]+\.[^@]{2}[^@]*$/ actually checks for 1 @ sign. Your regex will let multiple through because of the .* at the end.
– JoshNov 11 '11 at 6:31

1

Right. I'm not trying to reject all invalid, just keep from rejecting a valid email address.
– spigNov 14 '11 at 17:48

1

It would be far better to use this: /^[^@]+@[^@]+\.[^@]{2,4}$/ making sure that it ends with 2 to 4 non @ characters. As @Josh pointed out it now allows an extra @ in the end. But you can also change that as well to: /^[^@]+@[^@]+\.[^a-z-A-Z]{2,4}$/ since all top level domains are a-Z characters. you can replace the 4 with 5 or more allowing top level domain names to be longer in the future as well.
– FLYJan 14 '13 at 10:51

spoon16: That link isn’t really correct. Its statement that there can be no perfect pattern for validating email addresses is patently fault. You can, but you have to make sure that you follow the RFC right down to the letter. And you have to pick the right RFC, too.
– tchristNov 7 '10 at 20:27

The "best" right now does not work with java regex - even after properly escaping and converting the string.
– Eric ChenApr 17 '12 at 20:57

Not to mention that non-Latin (Chinese, Arabic, Greek, Hebrew, Cyrillic and so on) domain names are to be allowed in the near future. Everyone has to change the email regex used, because those characters are surely not to be covered by [a-z]/i nor \w. They will all fail.

After all, the best way to validate the email address is still to actually send an email to the address in question to validate the address. If the email address is part of user authentication (register/login/etc), then you can perfectly combine it with the user activation system. I.e. send an email with a link with an unique activation key to the specified email address and only allow login when the user has activated the newly created account using the link in the email.

If the purpose of the regex is just to quickly inform the user in the UI that the specified email address doesn't look like in the right format, best is still to check if it matches basically the following regex:

^([^.@]+)(\.[^.@]+)*@([^.@]+\.)+([^.@]+)$

Simple as that. Why on earth would you care about the characters used in the name and domain? It's the client's responsibility to enter a valid email address, not the server's. Even when the client enters a syntactically valid email address like aa@bb.cc, this does not guarantee that it's a legit email address. No one regex can cover that.

I agree the sending an authentication message is usually the best way for this kind of stuff, syntactically correct and valid are not the same. I get frustrated when I get made to type my email address twice for "Confirmation" as if I can't look at what I typed. I only copy the first one to the second anyway, it seems to be becoming used more and more.
– PeteTFeb 2 '10 at 15:05

agree! but this regex i don't think is valid because it allow spaces after the @. eg. test@test.ca com net is consider a valid email by using the above regex where as it should be returning invalid.
– CB4Nov 8 '17 at 17:54

Note: This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the @ character), too vague (after the @ character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

This doesn't add much over stackoverflow.com/a/8829363 and would IMHO be better as an edit of or comment on that.
– user743382Apr 29 '18 at 21:50

example@localhost is valid, but for a real world application you may want to enforce a domain extension, all you need to do is change the final * to a + to achieve this (changing that part of the pattern from 0+ to 1+)
– Mitch SatchwellMay 16 '18 at 9:05

Here's the PHP I use. I've choosen this solution in the spirit of "false positives are better than false negatives" as declared by another commenter here AND with regards to keeping your response time up and server load down ... there's really no need to waste server resources with a regular expression when this will weed out most simple user error. You can always follow this up by sending a test email if you want.

a) The "waste server resources" is infinitesimal, but if you are so inclined, you could do it client side with JS b) What is you need to send a registration mail and the user enters me@forgotthedotcom ? Your "solution" fails and you lose a user.
– johnjohnApr 3 '12 at 9:40

a) Relying on a JS validation that would fail when JavaScript is disabled doesn't sound like the best idea either (just btw)
– aucoDec 6 '13 at 15:39

What the devil language is that in?? I see a /D flag, and you’ve quoted it with single quotes yet also used slashes to delimit the pattern? It’s not Perl, and it can’t be PCRE. Is it therefore PHP? I believe those are the only three that allow recursion like (?1).
– tchristNov 7 '10 at 20:32

It's in PHP, which uses PCRE. The slashes are used only to delimit special characters like parentheses, square brackets, and of course slashes and single quotes. The /D flag, if you didn't know, is to prevent a newline being added to the end of the string, which would be allowed otherwise.
– MichaelRushtonFeb 19 '11 at 18:24

Strange that you "cannot" allow 4 characters TLDs. You are banning people from .info and .name, and the length limitation stop .travel and .museum, but yes, they are less common than 2 characters TLDs and 3 characters TLDs.

You should allow uppercase alphabets too. Email systems will normalize the local part and domain part.

For your regex of domain part, domain name cannot starts with '-' and cannot ends with '-'. Dash can only stays in between.

If you used the PEAR library, check out their mail function (forgot the exact name/library). You can validate email address by calling one function, and it validates the email address according to definition in RFC822.

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).