Ask Ben: Finding XML Nodes That Have Children With The Given Case-Insensitive Phrase

Okay, so how about this one. (BTW I love that I found this site and can ask all my stupid questions). I am bringing in an RSS feed in XML and parse it. Now I want to pull only the articles that pertain to some keyword. Like oh say COLDFUSION. .... What I want to do is pull only the articles with the search term in the title. I can do this of course by looping over the xml but is it possible with XPATH? I'm betting it is but I have just started into XPATH, XSLT, XSQL for Oracle. Does "IN" or "CONTAINS" work?

Yes, XPath does have a contains() method and is, in fact, the way we are going to find your RSS feed items (at least initially). First, though, let's build a test XML Feed structure:

<!--- Define the XML feed. --->

<cfxml variable="xmlFeed">

<items>

<item>

<title>I Love ColdFusion</title>

<description>ColdFusion is amazing!</description>

<link>http://www.bennadel.com</link>

</item>

<item>

<title>I Want To Swim In A Pudding Bath</title>

<description>Author talks about why it would be awesome to swim around in a bathtub full of pudding.</description>

<link>http://www.bennadel.com</link>

</item>

<item>

<title>I Think ColdFusion Knocked Up My Daughter</title>

<description>Author described a conspiracy theory in which he things his ColdFusion application server impregnated his daughter in an attempt to spawn a race of super humans with amazing back-end processing!</description>

<link>http://www.bennadel.com</link>

</item>

<item>

<title>Christina Cox Is A Hottie</title>

<description>Author talks about actress Christina Cox and what makes her such a hottie.</description>

<link>http://www.bennadel.com</link>

</item>

<item>

<title>COLDFusion Is So Hot!</title>

<description>Author describes what make ColdFusion such a hot technology.</description>

<link>http://www.bennadel.com</link>

</item>

</items>

</cfxml>

As you can see here, some of the Title tags contain "ColdFusion", some of them do not. Now, we don't want to find the Title tag, right? What we want to do is find the Item node that has the child node, Title, whose text value contains the phrase ColdFusion. To do this, we can leverage the power of XPath predicates (statements that must evaluate to true for a node to be returned in an XmlSearch() result set):

//item[ contains( title/text() , 'ColdFusion' ) ]

Here, the "//item" is telling us to get all the item nodes anywhere within the document. Then our conditional search predicate:

[ contains( title/text() , 'ColdFusion' ) ]

... requires that the given node being examined (item) must have a title child tag whose text() value contains the phrase "ColdFusion". Fairly straightforward, right. Let's put this into action:

<!---

Get all ITEM nodes that have a Title child whose text

value (text()) contains the text "ColdFusion".

--->

<cfset arrItemNodes = XmlSearch(

xmlFeed,

"//item[ contains( title/text() , 'ColdFusion' ) ]"

) />

<!--- Output the node titles. --->

<cfloop

index="xmlItemNode"

array="#arrItemNodes#">

#xmlItemNode.Title.XmlText#<br />

</cfloop>

When we run this code, we get the following output:

I Love ColdFusionI Think ColdFusion Knocked Up My Daughter

It sort of worked - it did find two correct items, but it missed this one:

COLDFusion Is So Hot!

The problem here is that XML and XPath, unlike ColdFusion itself, is very much case-sensitive. Where as in ColdFusion, "ColdFusion" is equal to "COLDFusion", XPath and XmlSearch() see these as two distinct values.

So, what can we do about this? Well, if you look at the library of XPath functions, you will see that it does have methods for converting values to upper or lower case:

lower-case()

upper-case()

This would be great, but the problem you will quickly find if you try to use them is that these methods have not been implemented as of ColdFusion 8's XPath / XmlSearch() engine. So, what can we do if we want to start performing case-insensitive searches? I don't think there's any one correct answer for this, so I'll just share the first thing that popped into my mind.

What we can do is create a lowercase version of the title text and store it back into the XML document in a way that 1) doesn't ruin the content for further use and 2) can be searched on using XPath and XmlSearch(). To do this, what I'm going to do is loop over the title tags and store the lowercase title as an attribute back into the title tag itself. Then, once that is done, I am going to perform the XPath search again using the title tag's "lcase" attribute rather than the XML Text value:

<!--- Gather all of the title nodes. --->

<cfset arrTitleNodes = XmlSearch(

xmlFeed,

"//item/title/"

) />

<!---

Loop over each title and store a lowercase attribute of

its value that can be searched on in a case-insensitive

manner.

--->

<cfloop

index="xmlTitleNode"

array="#arrTitleNodes#">

<!--- Store lowercase text in to attribute. --->

<cfset xmlTitleNode.XmlAttributes[ "lcase" ] = LCase(

XmlFormat( xmlTitleNode.XmlText )

) />

</cfloop>

<!---

Get all ITEM nodes that have a Title child whose LCASE

attribute contains the lowercase "coldfusion" value.

--->

<cfset arrItemNodes = XmlSearch(

xmlFeed,

"//item[ contains( title/@lcase, 'coldfusion' ) ]"

) />

<!--- Output the node titles. --->

<cfloop

index="xmlItemNode"

array="#arrItemNodes#">

#xmlItemNode.Title.XmlText#<br />

</cfloop>

Notice that this time, we are searching for "coldfusion," not "ColdFusion." There's a little bit more overhead here, but now, when we run this code, we get the following output:

I Love ColdFusionI Think ColdFusion Knocked Up My DaughterCOLDFusion Is So Hot!

With the aide of this lowercase attribute, we are successfully finding all case-versions of ColdFusion.

Of course, if we are going to loop over the Title tags, we might as well just perform the text search using ColdFusion and grab the appropriate nodes in the first pass. In the following code, as we loop over the Title tags, we are going to perform a case-insensitive ColdFusion text search. If the title has the right text, we are going to grab its parent node, the target Item node, and add it to our array of matching nodes:

<!--- Gather all of the title nodes. --->

<cfset arrTitleNodes = XmlSearch(

xmlFeed,

"//item/title/"

) />

<!--- Create an array of item nodes. --->

<cfset arrItemNodes = [] />

<!---

Loop over each title and check to see if the text contains

the phrase ColdFusion - since we are checking in ColdFusion,

we don't have to worry about case.

--->

<cfloop

index="xmlTitleNode"

array="#arrTitleNodes#">

<!--- Check for phrase. --->

<cfif FindNoCase( "ColdFusion", xmlTitleNode.XmlText )>

<!--- Add parent node (Item) to array. --->

<cfset ArrayAppend(

arrItemNodes,

xmlTitleNode.XmlParent

) />

</cfif>

</cfloop>

<!--- Output the node titles. --->

<cfloop

index="xmlItemNode"

array="#arrItemNodes#">

#xmlItemNode.Title.XmlText#<br />

</cfloop>

When we run the code this time, we get the following output:

I Love ColdFusionI Think ColdFusion Knocked Up My DaughterCOLDFusion Is So Hot!

Again, we gather all of the appropriate matches for "ColdFusion" without having to do any additional XPath / XmlSearch() calls.

This would all be made so much easier if ColdFusion would simply support case-conversion methods in XPath, but for now, I hope that something here may have helped.

Another note here is that the the nodes in the resultant array are not references to the original nodes, they're references to a separate XML doc which is created by the lcase(xmlFeed) operation. So one cannot update the nodes in the array and expect to see the updates in the original doc (like one usually would). So this one comes with some caveats, but if those are not a concern: it's an adequate approach.

Very nice tip on translate(). I have never used that before. Yes, tedius, but it works. As far as the LCase() of the entire XML document, I actually considered going down that path. But, then, my concern was getting back to the original reference in the first document.

Great post. I've also found that if the xml has a schema listed but is not valid the xml search fails even if the elements exist. If I deleted the schema ref (in the string xml prior to xmlParse) the search worked fine. Sure, you would think I should be using valid xml (against the schema) but the thing is I did not control the xml being returned from this web service and it wasn't. I did not see why xmlSearch should care. If the search works then return data dang you.

Well, Elliott, it would be about them implementing Saxon instead Xalan, wouldn't it? So it's every thing about them implementing something, isn't it?

>People seem to think that Macrodobe actually implement this stuff.

Yes. They seem to think Adobe implements third-party libraries to get the work done. They also seem to think that perhaps other capabilities might present themselves if CF's chosen XML solution was a different one, possibly one in keeping with the times.

I don't want to start attacking ColdFusion or Adobe here. When I say stuff about wishing they would implement it, I'm just generically saying, "That would be a cool feature to have." I don't mean much more than that.

Hi BenI don't think there's any way anything you said could've been construed as an attack against anything or one. Everything you said is spot on, valid, and I'm sure is something Adobe are giving at least some consideration to.

"How I Became An XSLT Junkie" :)I'm finding XSLT/XPATH etc etc so much easier to use than parsing and looping and handling errors in the xml than straight ColdFusion. I told my dba to have Oracle return XML results to me. But now we are looking at XSQL. Meanwhile the die hard Java, C#, VB programmers are going nutso wacko. (Are were they always that way?)

Seriously, I have scrapped my RSS integrator for websites and replaced it with a much simpler but more powerful XSLT version.

You have been an amazing resource for me as I grow my skills and this specific article is pretty close to what I'm looking for, but my question is what if you need a case insensitive search of a node?

Specifically you are expecting people to send xml to you a certain way but you can't trust they won't do contactINFO or contactinfo instead of contactInfo.The attribute trick you showed here won't work in this case because it's the NODE itself that we can't find properly.

Sorry, I had to muck with your comment a bit. For some reason, my editor was totally not able to parse whatever you wrote (I was trying to fix the bolding). As such, I think the bold tag got stripped out.

I hope that future versions of ColdFusion can update the xmlSearch() functionality a bit; I've run into unsupported xpath errors a good number of times. When that happens, the only approach that I have found is what you did - a combination of XPath and good old fashion ColdFusion looping.

Actually, for the benefit of anyone reading this who might want to make sense of the question post, the first <td> had a bold tag surrounding the numeral 6. So, the problem was that the xmlSearch wouldnt return that <td>, but would return the second one because the second one only had plain text, no embedded tags.

Thanks Ben for the above code. I do have a questions. I am very new to ColdFusion and xml. I'm working on an employee phone search, when the input is the employees entire lastname it works great but if someone inputs - an "a" the search brings back every name that has an "a" in it. How can I narrow down the results to only bring back names starting with "a". Any suggestions?