Seems like you left out the option of +1 instead of just 1 although the answer covers this case.
–
EricJul 21 '09 at 11:56

80

Don't all US phone numbers require 555 in them apart from 911?
–
Andrew GrimmNov 25 '10 at 11:54

17

On the contrary, the 555 prefix is reserved for fake-out phone numbers. Those numbers are guaranteed not to connect to an actual phone number so they're often used in television and movies to ensure that a viewer doesn't try to call the number and end up harassing some poor innocent.
–
rushingeDec 8 '10 at 19:59

24

@rushinge While that was true decades ago, it is no longer true. The 555 prefix is still special, but only a small range of numbers are guaranteed to terminate without connection 555-0100 to 555-0199. But I'm pretty sure Andrew was joking anyway.
–
Adam DavisOct 21 '11 at 18:46

32 Answers
32

Better option... just strip all non-digit characters on input (except 'x' and leading '+' signs), taking care because of the British tendency to write numbers in the non-standard form +44 (0) ... when asked to use the international prefix (in that specific case, you should discard the (0) entirely).

@scunliffe There are two problems here, creating output from what a user typed in, that ends up looking like a phone number, which this kind of solves, and determining if what a user typed in, is in fact a valid phone number, which this does not solve at all. If you want validation, i.e. determining if a string is a valid phone number, this doesn't do that. The questions asks about validation, not formatting. Of course the author of the question is the final arbiter, but the question, as asked is not answered by this answer.
–
PlexQMar 31 '12 at 11:23

It turns out that there's something of a spec for this, at least for North America, called the NANP.

You need to specify exactly what you want. What are legal delimiters? Spaces, dashes, and periods? No delimiter allowed? Can one mix delimiters (e.g., +0.111-222.3333)? How are extensions (e.g., 111-222-3333 x 44444) going to be handled? What about special numbers, like 911? Is the area code going to be optional or required?

Here's a regex for a 7 or 10 digit number, with extensions allowed, delimiters are spaces, dashes, or periods:

here it is without the extension section (I make my users enter ext in a separate field): ^(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]‌​)\s*)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-‌​9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})$
–
DJTripleThreatMay 4 '10 at 4:37

@StevenSoroka I have had Jeffrey Friedl's book beside me on my desk for the past two years, as regular expressions are a major part of my work. It takes a good while to really understand regular expressions. Sometimes, the readers of this site are simply looking for an existing soltuion, rather than writing their own, especially in domains with lots of corner cases, such as phone number representations.
–
Justin R.Mar 28 '13 at 21:06

3

@fatcat1111 I understand that, but majority of the responses here are "me too" type of one-off regular expressions that likely don't fit any of your corner cases. These then end up on all the websites I'm trying to use and I can't enter my zip code or phone number or email address because someone used a half-baked regular-expression (eg: + is a valid character in email addresses). The best responses on this page point users to libraries, not to napkin-scrawled regexes.
–
Steven SorokaApr 5 '13 at 21:49

If the user wants to give you his phone number, then trust him to get it right. If he does not want to give it to you then forcing him to enter a valid number will either send him to a competitor's site or make him enter a random string that fits your regex. I might even be tempted to look up the number of a premium rate sex line and enter that instead.

I would also consider any of the following as valid entries on a web site:

I agree with the sentiment here, but sometimes it's nice to perform validation when the phone number is actually going to be used for something important in the interest of the user. Best example here is credit card authorization for a purchase. If the phone number is wrong, the auth might fail.
–
PointyNov 10 '10 at 18:21

17

If the user doesn't want to enter his phone number you can just allow the field to be optional, but is it to much to ask the user to enter a valid phone number if they are going to enter one?
–
jcmcbethDec 6 '10 at 19:41

4

Also a role of validation is simply to remind people to add area codes etc that they might not otherwise remember to add, but which cannot possibly be guessed after the fact.
–
Ben McIntyreFeb 23 '11 at 0:09

14

@Pointy But regex validation won't help you. The one and the only way to actually validate if the phone number is correct is to actually send a message to it (in case of mobile) AND make sure the user confirms using some kind of verification code. This is what you do when the number correctness is important. Everything else is just for user's convenience to protect against some (but not all) typos and does not validate anything.
–
Alex BNov 16 '12 at 6:40

12

Heh... I've used 867-5309 on a lot of forms
–
cwallenpooleApr 16 '13 at 16:31

Although the answer to strip all whitespace is neat, it doesn't really solve the problem that's posed, which is to find a regex. Take, for instance, my test script that downloads a web page and extracts all phone numbers using the regex. Since you'd need a regex anyway, you might as well have the regex do all the work. I came up with this:

1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})(\se?x?t?(\d*))?

Here's a perl script to test it. When you match, $1 contains the area code, $2 and $3 contain the phone number, and $5 contains the extension. My test script downloads a file from the internet and prints all the phone numbers in it.

You can change \W* to \s*\W?\s* in the regex to tighten it up a bit. I wasn't thinking of the regex in terms of, say, validating user input on a form when I wrote it, but this change makes it possible to use the regex for that purpose.

note that stripping () characters does not work for a style of writing UK numbers that is common: +44 (0) 1234 567890 which means dial either the international number:
+441234567890
or in the UK dial 01234567890

You'll have a hard time dealing with international numbers with a single/simple regex, see this post on the difficulties of international (and even north american) phone numbers.

You'll want to parse the first few digits to determine what the country code is, then act differently based on the country.

Beyond that - the list you gave does not include another common US format - leaving off the initial 1. Most cell phones in the US don't require it, and it'll start to baffle the younger generation unless they've dialed internationally.

If you just want to verify you don't have random garbage in the field (i.e., from form spammers) this regex should do nicely:

^[0-9+\(\)#\.\s\/ext-]+$

Note that it doesn't have any special rules for how many digits, or what numbers are valid in those digits, it just verifies that only digits, parenthesis, dashes, plus, space, pound, asterisk, period, comma, or the letters e, x, t are present.

It should be compatible with international numbers and localization formats. Do you foresee any need to allow square, curly, or angled brackets for some regions? (currently they aren't included).

If you want to maintain per digit rules (such as in US Area Codes and Prefixes (exchange codes) must fall in the range of 200-999) well, good luck to you. Maintaining a complex rule-set which could be outdated at any point in the future by any country in the world does not sound fun.

And while stripping all/most non-numeric characters may work well on the server side (especially if you are planning on passing these values to a dialer), you may not want to thrash the user's input during validation, particularly if you want them to make corrections in another field.

My gut feeling is reinforced by the amount of replies to this topic - that there is a virtually infinite number of solutions to this problem, none of which are going to be elegant.

Honestly, I would recommend you don't try to validate phone numbers. Even if you could write a big, hairy validator that would allow all the different legitimate formats, it would end up allowing pretty much anything even remotely resembling a phone number in the first place.

In my opinion, the most elegant solution is to validate a minimum length, nothing more.

If you're talking about form validation, the regexp to validate correct meaning as well as correct data is going to be extremely complex because of varying country and provider standards. It will also be hard to keep up to date.

I interpret the question as looking for a broadly valid pattern, which may not be internally consistent - for example having a valid set of numbers, but not validating that the trunk-line, exchange, etc. to the valid pattern for the country code prefix.

North America is straightforward, and for international I prefer to use an 'idiomatic' pattern which covers the ways in which people specify and remember their numbers:

The North American pattern makes sure that if one parenthesis is included both are. The international accounts for an optional initial '+' and country code. After that, you're in the idiom. Valid matches would be:

(xxx)xxx-xxxx

(xxx)-xxx-xxxx

(xxx)xxx-xxxx x123

12 1234 123 1 x1111

12 12 12 12 12

12 1 1234 123456 x12345

+12 1234 1234

+12 12 12 1234

+12 1234 5678

+12 12345678

This may be biased as my experience is limited to North America, Europe and a small bit of Asia.

I answered this question on another SO question before deciding to also include my answer as an answer on this thread, because no one was addressing how to require/not require items, just handing out regexs:
Regex working wrong, matching unexpected things

From my post on that site, I've created a quick guide to assist anyone with making their own regex for their own desired phone number format, which I will caveat (like I did on the other site) that if you are too restrictive, you may not get the desired results, and there is no "one size fits all" solution to accepting all possible phone numbers in the world - only what you decide to accept as your format of choice. Use at your own risk.

Quick cheat sheet

Start the expression: /^

If you want to require a space, use: [\s] or \s

If you want to require parenthesis, use: [(] and [)] . Using \( and \) is ugly and can make things confusing.

If you want anything to be optional, put a ? after it

If you want a hyphen, just type - or [-] . If you do not put it first or last in a series of other characters, though, you may need to escape it: \-

If you want to accept different choices in a slot, put brackets around the options: [-.\s] will require a hyphen, period, or space. A question mark after the last bracket will make all of those optional for that slot.

Trying to build a comprehensive regex from scratch is usually a bad idea, unless you have good hard reasons for implementing it. Are you in direct contact with SMSCs, or other telcom operated hardware? If that's the case, you should be able to get this sort of validation related information from them.

The question should probably be specified in a bit more detail to explain the purpose of validating the numbers. For instance, 911 is a valid number in the US, but 911x isn't for any value of x. That's so that the phone company can calculate when you are done dialing. There are several variations on this issue. But your regex doesn't check the area code portion, so that doesn't seem to be a concern.

Like validating email addresses, even if you have a valid result you can't know if it's assigned to someone until you try it.

If you are trying to validate user input, why not normalize the result and be done with it? If the user puts in a number you can't recognize as a valid number, either save it as inputted or strip out undailable characters. The Number::Phone::Normalize Perl module could be a source of inspiration.

After reading through these answers, it looks like there wasn't a straightforward regular expression that can parse through a bunch of text and pull out phone numbers in any format (including international with and without the plus sign).

Here's what I used for a client project recently, where we had to convert all phone numbers in any format to tel: links.

So far, it's been working with everything they've thrown at it, but if errors come up, I'll update this answer.

I work for a market research company and we have to filter these types of input alllll the time. You're complicating it too much. Just strip the non-alphanumeric chars, and see if there's an extension.

For further analysis you can subscribe to one of many providers that will give you access to a database of valid numbers as well as tell you if they're landlines or mobiles, disconnected, etc. It costs money.

@PlexQ 555-123-1234, 07777777777, 90210, 01/01/1901 - users are inventive in ramming garbage through validation. Better to not tic off the ones who genuinely do have some odd data by using overly restrictive validation and telling them they're wrong.
–
ReactiveRavenApr 29 '12 at 3:19

My inclination is to agree that stripping non-digits and just accepting what's there is best. Maybe to ensure at least a couple digits are present, although that does prohibit something like an alphabetic phone number "ASK-JAKE" for example.

A couple simple perl expressions might be:

@f = /(\d+)/g;
tr/0-9//dc;

Use the first one to keep the digit groups together, which may give formatting clues. Use the second one to trivially toss all non-digits.

Is it a worry that there may need to be a pause and then more keys entered? Or something like 555-1212 (wait for the beep) 123?

You would probably be better off using a Masked Input for this. That way users can ONLY enter numbers and you can format however you see fit. I'm not sure if this is for a web application, but if it is there is a very click jQuery plugin that offers some options for doing this.

I was struggling with the same issue, trying to make my application future proof, but these guys got me going in the right direction. I'm not actually checking the number itself to see if it works or not, I'm just trying to make sure that a series of numbers was entered that may or may not have an extension.

Worst case scenario if the user had to pull an unformatted number from the XML file, they would still just type the numbers into the phone's numberpad 012345678x5, no real reason to keep it pretty. That kind of RegEx would come out something like this for me:

Is it possible to have the for display 4 separate fields ( Area Code, 3-digit prefix, 4 digit part, extension) so that they can input each part of the address separately, and you can verify each piece individually? That way you can not only make verification much easier, you can store your phone numbers in a more consistent format in the database.