The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Creating a Parsing Engine

I've been working with some regex replace functions lately and have noticed a small flaw in what I was using it for. It replaces all of one type before replacing all of another type.

For example, let's say there's a keyword of "%loopFunction:%" and this was in two different places of a document. The first place this function would output one thing, but the second time it would output a totally different thing. The problem is, though, that the replace function replaces both sections with the same code as the first output.

So this got me to think about a parsing engine that would parse from the top-down of a document. However, I can't really think about how this could be done with PHP.

The only method I can really think of would be to read in X bytes of a file and parse. However, this would have two major flaws. The first one being that it would be extremely slow and wasteful, and the second being that it would need to make exceptions for properties or loops that would possibly span multiple lines or a large number of bytes.

Does anyone have any experience in doing something like this? Even if you haven't done it in PHP, maybe you could post some concept code that could get the logical juices in the brain flowing. Thanks in advance for any help here.

i had some thoughts about defining a bunch of tokens. you go through the document maybe character by character. once you find a token, you now check your list of which other tokens which may be contained within the current token, if any at all. continue that process until you each the end of that token, and any nested tokens, if any.

maybe a primitive system could be made with arrays, strpos, a loop, substr or substr_replace, strlen.