Structs As Query Indexes, Speed, And Rick Osborne

For those of you who follow my blog, you will know that Rick Osborne is the guy who comes in after I explain things and makes killer suggestions about how they can be done better. I recently gave a case study of how ColdFusion code can be optimized. It involved using the IndexOf() method of the ColdFusion query column object. Rick came in and suggested that using a Struct to create your own query-index would perform faster. As I am a man who likes to learn by doing, I thought I would put this to the test.

To test this, I query from a web statistics program. I am getting information from two tables: web_stats_hit and web_stats_session. Each "hit" in the hits table will have a session id in the session table. The session table also have a UUID column "session_id".

As I am trying to test the merging of two data sources, I am going to hit each table individually and then try to update one with matching values from the other. First I am grabbing the two different data sets:

<!--- Query for web hits. --->

<cfquery name="qHit" datasource="...">

SELECT

h.id,

h.date_created,

h.web_stats_session_id,

(

''

) AS session_id

FROM

web_stats_hit h

</cfquery>

<!--- Query for web sessions. --->

<cfquery name="qSession" datasource="...">

SELECT

s.id,

s.session_id

FROM

web_stats_session s

</cfquery>

As I tried to explain earlier, for every qHit.web_stats_session_id, there is a matching session, such that for some combo, qHit.web_stats_session_id == qSession.id. And just to get an idea of the amount of data we are talking about:

qHit: 52,290 records

qSession: 34,753 records

That's a LOT of data to go through. Let's get our "test" on:

<cftimer label="IndexOf() Methodology" type="outline">

<!--- Loop over the hit query. --->

<cfloop query="qHit">

<!---

We want to find a matching session_id based

on the session. Get index of matching row.

--->

<cfset intIndex = qSession[ "id" ].IndexOf(

JavaCast( "int", qHit.web_stats_session_id )

) />

<!--- Add one to index (to be ColdFusion friendly). --->

<cfset intIndex = (intIndex + 1) />

<!--- Check to see if we have an index. --->

<cfif intIndex>

<!--- We found the match, update the row. --->

<cfset qHit[ "session_id" ][ qHit.CurrentRow ] =

qSession[ "session_id" ][ intIndex ]

/>

</cfif>

</cfloop>

</cftimer>

<cftimer label="Struct Index Methodology" type="outline">

<!--- Create a session look up table. --->

<cfset objSessionLookUp = StructNew() />

<!---

Loop over session and set index rows. We will be using

the id column of the session as the key and the

session_id as the value. This creates our very own,

in-memory index of the qSession query based on ID.

--->

<cfloop query="qSession">

<!--- Index this value. --->

<cfset objSessionLookUp[ qSession.id ] = qSession.CurrentRow />

</cfloop>

<!--- Loop over the hit query. --->

<cfloop query="qHit">

<!---

Check to see if the session key exists. If it

does, then we found a match.

--->

<cfif StructKeyExists(

objSessionLookUp,

qHit.web_stats_session_id

)>

<!--- Update the session based on the struct-index. --->

<cfset qHit[ "session_id" ][ qHit.CurrentRow ] =

objSessionLookUp[ qHit.web_stats_session_id ]

/>

</cfif>

</cfloop>

</cftimer>

It turns out Rick was absolute correct. The struct index performs MUCH faster. Here are the stats:

IndexOf() Methodology : 402,037 ms

Struct Index Methodology : 33,403 ms

Some quick math will show you that the struct index method performs in 8% of the time that the IndexOf() methodology does. EIGHT PERCENT! Nuts. I guess the only downside is that you can potentially create a HUGE in-memory structure; but it's only temporary.