If you save a HTML page to a text file, or as a string in memory, you are likely to get the relative path contained in any IMG or HREF tags

Example snippet:

<H1>Hello from Test Page</H1><IMG src="images/TestImage1.jpg">

When you then load you saved HTML page you will not see the image because the HTML is looking for TestImage1.jpg in the images folder which doesn't exist. All that exists is your saved HTML text file.So we need to parse the HTML and prefix the missing server path to the src tag in the HTML.

The most efficient way to achieve this is to use the power of Regular Expressions, but I'm no expert with RegEx's so after trawling the Internet looking for a suitable RegEx example, rather than read a book ;), I finally found the correct expression at code.nontalk.com.

If you need to match a single word from a range of words you can use regular expression Alternation. This syntax uses the pipe (|) to represent the logical OR for matching. Here is a RegEx that only matches on 1 of the 4 words (depending on case sensitivity):

^(word1|word2|word3|word4)$

If you need to also allow a possitive macth on an empty string you can and another pipe (|) with no word after like so: