Keyword Searching in SQL Server

Introduction

As the concept of search continues to grow in scope, I tend to find myself eschewing multiple levels of categorization in favor of larger groups. I'll give you two examples of what I mean.

Think of your inbox. I used to have folders for different systems, folders for different clients, folders for different colleagues, and so on. Now I tend to leave everything in my inbox and use the various search features of my mail client or other search applications to retrieve messages as I need them. I'm not sure if this is due to the fact that the searching capabilities of these applications are better than they used to be or if I'm just more prone to searching for things since that's what all the hip kids are doing these days.

Think of a data driven web site. A retail site possibly. As you drill down into the categories of goods, you'll usually have a breadcrumb trail somewhere on the page. Do we really drill that far down nowadays or do we immediately go to the search box and type in what we're looking for? If it's the latter case, what's the point of having a bunch of piddly little categories like Electronics > Digital Cameras > Memory Cards > 512MB? Why not just dump everything in one big category and let the search functionality take care of the process of finding something?

I like to be able to find something on the fly when necessary and I don't want to have to remember how I originally categorized it. What's the login info for that FTP site? Categorization is tedious at best, and often times the same person will choose different categorizations for a given item at different times. The following is a simple way to implement basic keyword search functionality using SQL Server and nothing but.

Here are our objects we'll end up with:

LogEntries (Table)

Keywords (Table)

LogEntry_Keyword (Table)

sp_logentry (Sproc)

sp_search (Sproc)

Setting up our tables

I'm no cognitive scientist, so I can't tell you exactly how our brains keep track of bits of information. But the things that we need to remember are often very fragmented and defy categorization. I'd venture a guess that our brains are like logs and we make entries into that log. So let's have a table called LogEntries. The most basic column is the actually entry itself - LogEntry varchar(8000):

"Remember that you increased the budget for the Razzmatazz account by $10K."

And when we make that mental note, it would be nice to remember when we made it - DateEntered datetime:

'YYYY-MM-DD hh:mm:ss'

But when we remember it, do we actually think of the exact phrase we originally thought of? Probably not. We probably remember something more like "razzmatazz account budget increased". So now we have some keywords (or tags) associated with our entry. Tags varchar(200):

Now for the trigger

In most cases, we'd probably prefer to process the tags of our entries by using another application. The String.Split function comes to mind. Then we'd hash the keywords where the hash value is the number of occurrences and we'd later do our insert into the LogEntry_Keyword table.
But we're keeping it simple and want to make everything self contained. Here's how the trigger does that.

CREATE TRIGGER trgInsertLogEntry
ON LogEntries
FOR INSERT
AS
------------------
-- Declarations --
------------------
DECLARE @tags AS varchar(200) -- This will hold the tags string (e.g. 'increase budget such 'n' such account')
DECLARE @keyword AS varchar(50) -- An individual keyword from the tags string (e.g. 'increase')
DECLARE @keywordId AS int -- The keyword id from the keywords table
DECLARE @found AS int -- Whether or not the keyword wAS already in the keywords table
DECLARE @entryId AS int -- The entry id of the logentry being inserted (@@identity)
-------------------------------------------------------------------------
-- Temp table for current keyword data for the newly inserted LogEntry --
-------------------------------------------------------------------------
CREATE TABLE #kw
(
kwid int PRIMARY KEY, -- The id from the Keywords table
hits int -- The number of occurrences of this keyword
)
-------------------------------------------
-- Data from the newly inserted LogEntry --
-------------------------------------------
SET @entryId = @@identity -- Get the newly inserted logentry id
SET @tags = (SELECT tags FROM INSERTED) -- Get the newly inserted tag
SET @tags = LTRIM(RTRIM(@tags)) + ' xyz' -- Add a fake keyword to the end that won't get inserted
-------------------------------------------------------------------------
-- While there are still keywords in the newly inserted LogEntry's tag --
-------------------------------------------------------------------------
WHILE (CHARINDEX(' ', @tags) > 0)
BEGIN
-- Get the next keyword from the tags string
SET @keyword = SUBSTRING(@tags, 0, CHARINDEX(' ', @tags))
-- Get the KeywordId from the Keywords table
SELECT @keywordid = (SELECT KeywordId FROM Keywords WHERE Keyword = @keyword)
-- Insert the keyword if necessary
IF (@keywordId IS NULL)
BEGIN
INSERT INTO Keywords VALUES (@keyword)
SET @keywordId = @@identity
END
-- See if the keyword id is in the temp table yet
SELECT @found = (SELECT COUNT(*) FROM #kw WHERE kwid = @keywordId)
-- If not found insert it
IF (@found = 0)
INSERT INTO #kw VALUES (@keywordId, 1)
-- If found update the hit count
IF (@found != 0)
UPDATE #kw SET hits = hits + 1 WHERE kwid = @keywordId
-- Update the tags by lopping off the keyword just processed
SET @tags = substring(@tags, charindex(' ', @tags) + 1, len(@tags) - charindex(' ', @tags))
END
---------------
-- End while --
---------------
-------------------------------------------------------------------------------
-- Insert the keywords and their occurrences into the LogEntry_Keyword table --
-------------------------------------------------------------------------------
INSERT INTO logentry_keyword
SELECT @entryid, kwid, hits
FROM #kw

So that takes care of everything we need to implement our project. Now let's add a few things to make inserting entries easier and finding things easier.

Getting the data in there

The whole purpose behind this exercise is to be able to associate a piece of data with string identifiers. When creating the data we want to be able to say (to ourselves):

Okay, I need to remember that at today's meeting we discussed ways of improving communication within the company. The first way was to blah blah blah...

Conclusions

Yes, this is probably more of an exercise than something you'll put into practice. But the basic idea is straightforward and useful when expounded upon. If you decide to do more with it here are some ideas as well as some things to look out for.

Hook it up to a basic UI for a popup notepad.

Take care to validate your tags string to make sure it's a space delimited string.

I might also suggest adding a having count(le.EntryId) >= (select count(*) from #kw) clause into the search sproc in the event you want to only include entries featuring all of the searched keywords.

With a little more validation, you can get rid of the Tags column and process the LogEntry column by itself.

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands
of articles and SQL scripts, a library of free eBooks, a weekly database news roundup,
a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals
that makes it such a success.