Regular Expressions

I quite often find the need to extract a section of text from the beginning of a blog post or similar to be used as the excerpt. I normally use a function that will count the number of whole words available and return the string containing those words.

A good alternative to this, although only applicable if the original post is in HTML, is to use a regular expression to extract the contents. The following code will take a string and extract just the first paragraph of text.

Encoding special characters in a block of HTML or other code can be a pain because there might already be ampersands there that impart encoding. This might be an ampersand that has already been encoded with a &amp;, or it might be an ampersand in the code as an if statement or similar.

Use the following regular expression to find any ampersand that hasn't already been encoded.

Writing regular expressions can sometimes be a real pain, especially if you are not used to them. Rather than trying for yourself to make a regular expression you might want to think about looking for regular expressions that other people have made. Rather than reinventing the wheel to prove you can do something,using free third party regular expressions can save you a lot of time.

Regular expressions are a very useful tool for any programmer wanting to validate input, format strings, change words, reformat data or even split apart a string into an array. However, when you are starting out, writing them it can be hard going, they are not very easy to learn and the only way to really understand them is to practice, practice, practice.