Fast Text Processing in SQL Server : Page 4

Processing text or long strings usually reduces SQL to a prosaic procedural language. Learn a few techniques for facilitating speedy text processing in SQL.

by Alex Kozak

Apr 26, 2006

Page 4 of 4

SQL Server 2000 and SQL Server 2005 Techniques

The recursive queries using CTE are available only in SS2005, so to produce the same results in SS2000 as the techniques in the previous section did, you need to apply dynamic SQL. The following script (see Listing 3) demonstrates that technique:

Listing 3. How to Convert by Words and Load Text into the Table, Using Dynamic SQL

SET QUOTED_IDENTIFIER OFF
GO
SET NOCOUNT ON
DECLARE @str varchar(8000)
SELECT @str = "James Joyce (1882-1941) was an Irish writer and poet, and is
widely considered one of the most significant writers of the 20th century."

The logic in this example is very simple. You need to add the header "INSERT INTO #dynWords(word) SELECT A=" to the original text and then replace the spaces between the words with the phrase "UNION ALL SELECT". You will get the following string:

INSERT INTO #dynWords(word) SELECT A="James"UNION ALL SELECT"Joyce"… UNION ALL SELECT"century."

When you execute this string dynamically, the result will be loaded into the #dynWords table.

Queries 1 and 2 are self-explanatory. Query 3 is much more interesting, though it uses a pretty well known technique with self-joins. The aggregate function SUM(LEN(t1.word)+1) accumulates the number of letters (including the spaces) in the words preceding the current one. Using COUNT() instead of SUM(), you could get the number of words preceding the current one--in other words, the order number of each word in the text, which already is represented by the identity column wordID.

-- 1) Find words "and" and words, starting with a letter "w".
SELECT wordID, word
FROM dynWords
WHERE word LIKE 'w%'OR word LIKE 'AND';
Result:
wordID word
----------- --------------
4 was
7 writer
8 and
10 and
12 widely
19 writers
-- 2) The position of each phrase, starting with a letter "w".
SET QUOTED_IDENTIFIER OFF
GO
SET NOCOUNT ON
DECLARE @str varchar(8000)
SELECT @str = "James Joyce (1882-1941) was an Irish writer and poet, and is
widely considered one of the most significant writers of the 20th century."
SELECT @str = REPLACE(@str, CHAR(13),' ')
SELECT @str = REPLACE(@str, CHAR(10), '')
IF EXISTS(SELECT name FROM sysobjects WHERE name = 'dynWords' AND type = 'U')
DROP TABLE dynWords
CREATE TABLE dynWords(wordID int identity(1,1) not null,
word varchar(100) not null)
SELECT @str = 'INSERT INTO dynWords(word) SELECT A="' +
REPLACE(@str, ' w', '"UNION ALL SELECT"w') + '"';
EXECUTE(@str)
PRINT 'The position of each phrase starting with "w":'
SELECT tbl2.*, tbl1.c1 as startPos FROM
(SELECT (SUM(LEN(t1.word)+1)+1) as c1, c2 = t2.wordID
FROM dynWords AS t1 INNER JOIN dynWords AS t2
ON t1.wordID < t2.wordID
GROUP BY t2.wordID) tbl1
INNER JOIN (SELECT * FROM dynWords) tbl2
ON tbl1.c2 = tbl2.wordID
Result:
The position of each phrase starting with "w":
wordID word startPos
----------- ---------------------------------------------- ---------
2 was an Irish 24
3 writer and poet, and is 37
4 widely considered one of the most significant 61
5 writers of the 20th century. 107

SQL Server Speed Gap: 2005 vs. 2000

Text conversion and processing techniques are very powerful and flexible; you can use them in many projects. The techniques with dynamic SQL are more restrictive (a 8,000-character limit for the varchar data type or 4,000 for nvarchar) than the recursive techniques, which are faster, but you can use them only in SS2005.