Answered by:

Fastest way to replace non numeric characters from string

Question

Hi Guys, I am trying to find out what’s the fastest way to remove all non-numeric characters (incl space) from a varchar without affecting the performance. This will be used to check multiple columns in the table so I am looking for some efficient way to
do this

Answers

I have written a function for you in TSQL doing exactly what you want. It takes 13 secs if you dont use the function for 100,000 records and 8 secs if you are using functions for same records, so it seems pretty fast.

As an alternative solution I offer this one time laborious (or rather, it requires a lot of cut and paste) solution, which is likely to outperform any solution that uses a scalar UDF.

The solution below is made for varchar columns up to 40 characters, but obviously this is trivially easy to change. The test was also run for 40 characters.

/* -- helps to create the query
select '+CASE WHEN SUBSTRING(Data,'+CAST(rn AS varchar(12))+',1) LIKE ''[0-9]'' THEN SUBSTRING(Data,'+CAST(rn AS varchar(12))+',1) ELSE '''' END'
from (select top 40 row_number() over (order by id) as rn from sysobjects) x
*/
set statistics time on
go
SELECT CASE WHEN SUBSTRING(Data,1,1) LIKE '[0-9]' THEN SUBSTRING(Data,1,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,2,1) LIKE '[0-9]' THEN SUBSTRING(Data,2,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,3,1) LIKE '[0-9]' THEN SUBSTRING(Data,3,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,4,1) LIKE '[0-9]' THEN SUBSTRING(Data,4,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,5,1) LIKE '[0-9]' THEN SUBSTRING(Data,5,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,6,1) LIKE '[0-9]' THEN SUBSTRING(Data,6,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,7,1) LIKE '[0-9]' THEN SUBSTRING(Data,7,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,8,1) LIKE '[0-9]' THEN SUBSTRING(Data,8,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,9,1) LIKE '[0-9]' THEN SUBSTRING(Data,9,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,10,1) LIKE '[0-9]' THEN SUBSTRING(Data,10,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,11,1) LIKE '[0-9]' THEN SUBSTRING(Data,11,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,12,1) LIKE '[0-9]' THEN SUBSTRING(Data,12,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,13,1) LIKE '[0-9]' THEN SUBSTRING(Data,13,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,14,1) LIKE '[0-9]' THEN SUBSTRING(Data,14,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,15,1) LIKE '[0-9]' THEN SUBSTRING(Data,15,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,16,1) LIKE '[0-9]' THEN SUBSTRING(Data,16,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,17,1) LIKE '[0-9]' THEN SUBSTRING(Data,17,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,18,1) LIKE '[0-9]' THEN SUBSTRING(Data,18,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,19,1) LIKE '[0-9]' THEN SUBSTRING(Data,19,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,20,1) LIKE '[0-9]' THEN SUBSTRING(Data,20,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,21,1) LIKE '[0-9]' THEN SUBSTRING(Data,21,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,22,1) LIKE '[0-9]' THEN SUBSTRING(Data,22,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,23,1) LIKE '[0-9]' THEN SUBSTRING(Data,23,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,24,1) LIKE '[0-9]' THEN SUBSTRING(Data,24,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,25,1) LIKE '[0-9]' THEN SUBSTRING(Data,25,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,26,1) LIKE '[0-9]' THEN SUBSTRING(Data,26,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,27,1) LIKE '[0-9]' THEN SUBSTRING(Data,27,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,28,1) LIKE '[0-9]' THEN SUBSTRING(Data,28,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,29,1) LIKE '[0-9]' THEN SUBSTRING(Data,29,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,30,1) LIKE '[0-9]' THEN SUBSTRING(Data,30,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,31,1) LIKE '[0-9]' THEN SUBSTRING(Data,31,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,32,1) LIKE '[0-9]' THEN SUBSTRING(Data,32,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,33,1) LIKE '[0-9]' THEN SUBSTRING(Data,33,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,34,1) LIKE '[0-9]' THEN SUBSTRING(Data,34,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,35,1) LIKE '[0-9]' THEN SUBSTRING(Data,35,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,36,1) LIKE '[0-9]' THEN SUBSTRING(Data,36,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,37,1) LIKE '[0-9]' THEN SUBSTRING(Data,37,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,38,1) LIKE '[0-9]' THEN SUBSTRING(Data,38,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,39,1) LIKE '[0-9]' THEN SUBSTRING(Data,39,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,40,1) LIKE '[0-9]' THEN SUBSTRING(Data,40,1) ELSE '' END
FROM TestFunctionPerf
go
set statistics time off

On my machine this query, if I look at the CPU time used, this approach is 5 times faster than Prajesh suggestion.

All replies

I have written a function for you in TSQL doing exactly what you want. It takes 13 secs if you dont use the function for 100,000 records and 8 secs if you are using functions for same records, so it seems pretty fast.

As an alternative solution I offer this one time laborious (or rather, it requires a lot of cut and paste) solution, which is likely to outperform any solution that uses a scalar UDF.

The solution below is made for varchar columns up to 40 characters, but obviously this is trivially easy to change. The test was also run for 40 characters.

/* -- helps to create the query
select '+CASE WHEN SUBSTRING(Data,'+CAST(rn AS varchar(12))+',1) LIKE ''[0-9]'' THEN SUBSTRING(Data,'+CAST(rn AS varchar(12))+',1) ELSE '''' END'
from (select top 40 row_number() over (order by id) as rn from sysobjects) x
*/
set statistics time on
go
SELECT CASE WHEN SUBSTRING(Data,1,1) LIKE '[0-9]' THEN SUBSTRING(Data,1,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,2,1) LIKE '[0-9]' THEN SUBSTRING(Data,2,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,3,1) LIKE '[0-9]' THEN SUBSTRING(Data,3,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,4,1) LIKE '[0-9]' THEN SUBSTRING(Data,4,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,5,1) LIKE '[0-9]' THEN SUBSTRING(Data,5,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,6,1) LIKE '[0-9]' THEN SUBSTRING(Data,6,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,7,1) LIKE '[0-9]' THEN SUBSTRING(Data,7,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,8,1) LIKE '[0-9]' THEN SUBSTRING(Data,8,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,9,1) LIKE '[0-9]' THEN SUBSTRING(Data,9,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,10,1) LIKE '[0-9]' THEN SUBSTRING(Data,10,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,11,1) LIKE '[0-9]' THEN SUBSTRING(Data,11,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,12,1) LIKE '[0-9]' THEN SUBSTRING(Data,12,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,13,1) LIKE '[0-9]' THEN SUBSTRING(Data,13,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,14,1) LIKE '[0-9]' THEN SUBSTRING(Data,14,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,15,1) LIKE '[0-9]' THEN SUBSTRING(Data,15,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,16,1) LIKE '[0-9]' THEN SUBSTRING(Data,16,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,17,1) LIKE '[0-9]' THEN SUBSTRING(Data,17,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,18,1) LIKE '[0-9]' THEN SUBSTRING(Data,18,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,19,1) LIKE '[0-9]' THEN SUBSTRING(Data,19,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,20,1) LIKE '[0-9]' THEN SUBSTRING(Data,20,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,21,1) LIKE '[0-9]' THEN SUBSTRING(Data,21,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,22,1) LIKE '[0-9]' THEN SUBSTRING(Data,22,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,23,1) LIKE '[0-9]' THEN SUBSTRING(Data,23,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,24,1) LIKE '[0-9]' THEN SUBSTRING(Data,24,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,25,1) LIKE '[0-9]' THEN SUBSTRING(Data,25,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,26,1) LIKE '[0-9]' THEN SUBSTRING(Data,26,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,27,1) LIKE '[0-9]' THEN SUBSTRING(Data,27,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,28,1) LIKE '[0-9]' THEN SUBSTRING(Data,28,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,29,1) LIKE '[0-9]' THEN SUBSTRING(Data,29,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,30,1) LIKE '[0-9]' THEN SUBSTRING(Data,30,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,31,1) LIKE '[0-9]' THEN SUBSTRING(Data,31,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,32,1) LIKE '[0-9]' THEN SUBSTRING(Data,32,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,33,1) LIKE '[0-9]' THEN SUBSTRING(Data,33,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,34,1) LIKE '[0-9]' THEN SUBSTRING(Data,34,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,35,1) LIKE '[0-9]' THEN SUBSTRING(Data,35,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,36,1) LIKE '[0-9]' THEN SUBSTRING(Data,36,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,37,1) LIKE '[0-9]' THEN SUBSTRING(Data,37,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,38,1) LIKE '[0-9]' THEN SUBSTRING(Data,38,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,39,1) LIKE '[0-9]' THEN SUBSTRING(Data,39,1) ELSE '' END
+CASE WHEN SUBSTRING(Data,40,1) LIKE '[0-9]' THEN SUBSTRING(Data,40,1) ELSE '' END
FROM TestFunctionPerf
go
set statistics time off

On my machine this query, if I look at the CPU time used, this approach is 5 times faster than Prajesh suggestion.