I need to be able to parse out the attributes in SCRIPT tags reliably for my SPScriptAudit function. jQuery does an excellent job on SCRIPT in the BODY but won't touch anything in the HEAD, so I need to do more heavy lifting to parse SCRIPTs there.

I need to be able to get the values of language, type, src, etc. reliably. I'd appreciate help from any regex gurus out there. I think I've reached the point of diminishing returns trying to solve this.

As you mention this is borderline SharePoint-related! I was too slow to pick this one up but please ask at somewhere like StackOverflow for this sort of thing next time. Thanks ;-)
–
Alex AngasJan 18 '10 at 1:52

Agreed. Oblique at best, and I'll try to honor the site intent better in the future. The good thing is, however, that I got some great help from Anders below. I think having the right SharePoint perspective probably allowed him to give me a better answer? (OK, still a stretch.)
–
Marc D AndersonJan 18 '10 at 3:46

Hi Marc, added an example of how to use the pattern in javascript.
I simplified it a bit, so if you need check for both ",' and no surrounding quotes you need to stuff that in again. Also removed the check for last

Be aware this is boilerplate code that just show you how to get all hits from the attribute in question with the source to parse as parameter. I did not add any logic on how you would return anything (if you for example wants all (here 3) src attribute values, you could add them to an array and return them).

[Edit: NB]
OBS: Just for the record, i see several other ways to do this that might be better for your purpose (like DHTML/DOM). Regular Expressions are quite often seen as the silver bullet of string handling. I dont always agree that they are. Parsing HTML can be very tricky, especially because the rules (standards) often are bent (browsers are forgiving), and you can have nested tags, both attributes with double quotes and single quotes or even without quotes at all is legal (even within same element), attribute order is random, html comment tags can contain html elements. For those reasons parsing HTML with RegExp is often a poor choice (especially when we talk JavaScript thats lacking some of the smarter regex functionality like recursion that languages like Perl and .NET has).

Beautiful, Anders! Not only a great answer, but I get a cool new tool to add to my arsenal. Thanks!
–
Marc D AndersonJan 15 '10 at 20:59

Actually, I'm having a little trouble getting this to work in JavaScript/jQuery. As you undoubtedly know, the regex syntax is a little different in JavaScript and Expresso doesn't support it at the moment. I may get it sorted out...
–
Marc D AndersonJan 16 '10 at 4:17

Point taken on on using regex here at all. I think given the narrow scope of what I'm trying to do that it's a reasonable approach: I'm only looking to parse the SCRIPT tags in the HEAD, no more. Promise I won't make it a habit. ;+)
–
Marc D AndersonJan 18 '10 at 3:44