A technical blog about web and database development discussing the various issues and problems I have experienced and overcome in my 15+ years of coding.

Monday, 19 January 2009

Cool Javascript regular expressions

Using Lambda Functions for HTML Parsing

One of the cool features that made me scratch my head when I first came across it but now love to bits about Javascript is the ability to use lambda expressions. A lambda expression basically means that you can use a function as the argument for another function. This is best seen with the replace method where you can use a function as the replacement value for a matching string test e.g

This function does a similar job of replacing the attribute with an empty string if its not allowed or otherwise returning the sub group that matches the attribute/value pair. This ends up being a very cool way of parsing HTML content using the power of regular expressions.

The whole function is below:

// Set up my regular expressions that will match the HTML tags and attributes that I want to allowvar reAllowedAttributes = /^(face|size|style|dir|color|id|class|alignment|align|valign|rowspan|colspan|width|height|background|cellspacing|cellpadding|border|href|src|target|alt|title)$/ivar reAllowedHTMLTags = /^(h1|h2|a|img|b|em|li|ol|p|pre|strong|ul|font|span|div|u|sub|sup|table|tbody|blockquote|tr|td)$/i

function ParseHTML(theHTML){// Start of with a test to match all HTML tags and a group for the tag name which we pass in as an extra parametertheHTML = theHTML.replace(/ ]+)[^>]*>/g, function(match,HTMLTag){// if the HTML tag does not match our list of allowed tags return empty string which will be used as a// a replacement for the pattern in our inital test.if(!reAllowedHTMLTags.test(HTMLTag)){return "";}else{// The HTML tag is allowed so check attributes with the tag

// Certain attributes are allowed so we do another replace statement looking for attributes and using another// function for the replacement value.match = match.replace(/ ([^=]+)="[^"]*"/g, function(match2, attributeName){// If the attribute matches our list of allowed attributes we return the whole match string// so we replace our match with itself basically allowing the attribute.if(reAllowedAttributes.test(attributeName)){return match2;}else{return ""; // not allowed so return blank string to wipe out the attribute value pair}});

}return match;

}); //end of the first replace

//return our cleaned HTMLreturn theHTML;}

Another good thing about this feature is that as well as being able to pass the match string in to the replacement function as a parameter you can also pass in any number of sub groups as extra parameters. So using my parseHTML function as an example again instead of only capturing the attribute name in my check for valid attributes I could also capture the attribute value and then pass that as an extra parameter to my replacement function like so:

So you could test for the validity of the supplied values if you wanted to. Maybe if you were allowing the class attribute you would want to check to make sure only certain class names were used.

This is brilliant for use in client side widgets and also as server side code for parsing user supplied HTML content. Remember even if you are using crusty ASP classic and writing your code in VB Script which has a really poor Regular Expression engine compared to Javascript you can still make use of this cool feature as there is nothing stopping you mixing and matching VB Script and Javascript on the server.

Safe To Buy?

This is my blog, clicking on the links will take you to my main site www.strictlysoftware.com which is HTTPS, shows the security padlock, and is safe to purchase from. More apps and plugins may be added soon and this is this place you will buy them from.

Who is Strictly-Software?

I'm a systems architect with 20+ years of IT experience
I currently work for myself as well as for a number of companies including one of the UK's leading recruitment software houses, having created 3 versions of their leading jobboard software.
I also work on multiple Horse Racing websites and have even developed my own automated betting BOT.
My history includes jobs at OCR specialists, a Management Consultancy and an Ofcom director run Telecoms company.
I'm an experienced developer with skills including .NET, C#, JavaScript, VB, ASP, PHP, HTML5, XHTML, CSS, SQL and MySQL.
I have developed my own JavaScript frameworks, markup languages, Windows and Web services. I have also made many Windows form applications plus I have developed numerous 5 star rated WordPress plugins.
On top of that I have created popular online tools and scripts that have been used by thousands of companies as well developing software that runs over 250 currently live websites.
I specialise in automation having created tools that enable feed mashups, automated scraping, auto-blogging and automatic SEO optimisation.
If you want to hire me then please email me using the contact link in the footer.