Validating Telephone Numbers

December 13, 2011

When I was a kid, telephones had rotary dials, not push buttons, and exchanges had names; my grandmother was in the Underhill 8 exchange. If you were calling someone in the same exchange as you were, you only had to dial the last four digits of the number. Long distance calling generally involved a human operator.

Modern American telephone numbers have ten digits, segmented as a three-digit area code, a three-digit exchange code, and a four-digit number. Within an area code, you need only dial (the verb hasn’t changed, even though telephones no longer have a dial) the seven-digit exchange code and number; otherwise, you must dial the complete ten-digit number, often with a prefix.

Our exercise today asks you to validate a telephone number, as if written on an input form. Telephone numbers can be written as ten digits, or with dashes, spaces, or dots between the three segments, or with the area code parenthesized; both the area code and any white space between segments are optional. Thus, all of the following are valid telephone numbers: 1234567890, 123-456-7890, 123.456.7890, (123)456-7890, (123) 456-7890 (note the white space following the area code), and 456-7890. The following are not valid telephone numbers: 123-45-6789, 123:4567890, and 123/456-7890.

Your task is to write a phone number validator that follows the rules given above; your function should either return a valid telephone number or an indication that the input is invalid. Be sure to write a proper test suite. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

@geirskjootskift: The task requires you to return the number if it is valid, not just a true or false.

@Ajay: Thank you. It’s Scheme, not Lisp. And because I want to, and because Scheme gives me options not (easily) available in C or Java or Python; Scheme gives me things like garbage collection and big integers for free, and in what other language could I add list comprehensions and pattern matching to the standard library, and write generators and streams and other little goodies?

I would normally use a list of regex’s like some of the previous solutions. I used one big regular expression; however, because having two problems is better than having just one problem (see http://regex.info/blog/2006-09-15/247).

normalize() returns a tuple with the areacode, exchange, and number, or raises a ValueError if the input is not a valid phone number. The areacode is None if one was not provided.

My regex-based version above was clearly a tounge-in-cheek solution. Here is a more useful solution.

It uses a regular expression to group the input string into digits and non-digits. The non-digits are normalized to form a punctuation string, which is compared with a set of valid punctuation strings. Other valid punctuation, e.g., 123/456-7890, can easily be added by adding the punctuation string, e.g., ” /-“, to validpunctuation.