I was asked today to take a table with a social security column and put all 0s if the column had any letters in it. This is the typical social security column that actually has more rows with ‘XXX-XX-XXXX’ than it does valid SSNs. There are even cases that look like ’1234-AB-56TS’ etc.

One option, if I was working in SQL 2012 (which I am very much looking forward to), is to use the new functions TRY_CAST and TRY_CONVERT like so.

SELECT CASE WHEN TRY_CONVERT(int, REPLACE(SSN_Col,'-','')) IS NULL
THEN '000000000'
ELSE SSN_Col END AS Tested_SSN

However I’m not working in SQL 2012 so my best bet is a pattern match. Going on the basis that there are some people who haven’t worked with pattern matching beyond LIKE ‘%TestString%’ I’m going to explain each pattern as I go along. For those that are more familiar with pattern matching please feel free to skim these bits.

I’ve seen patterns for SSNs like this:

SSN_Col LIKE
'[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'

By putting []s around one or more values you can test a single character for the pattern within the []s. So [a19c] tests to see if a single character is an a, 1, 9 or c. To test against a range you use a dash. So [0-9] tests a single character for a number from 0 to 9.

Now this has some obvious advantages over the TRY_CAST/CONVERT method. It’s a little bit more complicated but you get to put in an exact pattern. 123-45-6789. Numbers where you want them, dashes where you want them. However in my particular case my data isn’t all that good so I’m not sure if I have 1, 2 or no dashes.

So I can’t use as exact a pattern, but I could do this.

REPLACE(SSN_Col,'-','') LIKE
'[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

Now it doesn’t matter if I have dashes or not, I’ve stripped them out. I’m now testing that I have exactly 9 numbers. Absolutely perfect for testing against an SSN with uncertain delimiters. Unfortunately because of the REPLACE around the column any indexes on the SSN_Col table can’t be used.

So what’s another option? What I ended up with is going the negative approach. I want to find any character that ISN’T in the list. In pattern matching a ^ is used for just that. First we construct a pattern to find a given character. This is fairly basic and I think most DBAs have done something along these lines.

SSN_Col LIKE '%A%'

Now that is done we add the test for the existence of a number.

SSN_Col LIKE '%[0-9]%'

And last but least we test for the existence of any character that is NOT a number.

SSN_Col LIKE '%[^0-9]%'

And if you feel like allowing some symbols, for example the dash from above and maybe an @ sign then do this.

SSN_Col LIKE '%[^0-9-@]%'

In case I just confused anyone, the second dash is fine because it isn’t being used as a range. You can also add in an escape character if you want.

SSN_Col LIKE '%[^0-9\-@]%' ESCAPE '\'

The ESCAPE clause of LIKE lets us set the escape character for this string.

My name is Kenneth Fisher and I am Senior DBA for a large (multi-national) insurance company. I have been working with databases for over 20 years starting with Clarion and Foxpro. I’ve been working with SQL Server for 12 years but have only really started “studying” the subject for the last 3. I don’t have any real "specialities" but I enjoy trouble shooting and teaching. Thus far I’ve earned by MCITP Database Administrator 2008, MCTS Database Administrator 2005, and MCTS Database Developer 2008. I’m currently studying for my MCITP Database Developer 2008 and should start in on the 2012 exams next year. My blog is at www.sqlstudies.com.