<pre><?php/** get leading, trailing, and embedded separator tokens that were 'skipped'if for some ungodly reason you are using php to implement a simple parser that needs to detect nested clauses as it builds a parse tree */

Way it works, runs through the string character by character, for each character looking up the action to take, based on that character and its current $state.Actions can be (one or more of) adding the character/string to the current word, adding the word to the output array, and changing or (re)storing the state.For example a space will become part of the current 'word' (or 'token') if $state is 'doublequoted', but it will start a new token if $state was 'unquoted'.I was later told it's a "tokeniser using a finite state automaton". Who knew :-)

Might be pointing out the obvious but if you'd rather use a for loop rather than a while (to keep the token strings on the same line for readability for example), it can be done. Added bonus, it doesn't put a $tok variable outside the loop itself either.Downside however is that you're not able to manually free up the memory used using the technique mentioned by elarlang.

/** * Tests if there are more tokens available from this tokenizer's string. It * does not move the internal pointer in any way. To move the internal pointer * to the next element call nextToken() * @return boolean - true if has more tokens, false otherwise */public function hasMoreTokens() { return ($this->token !== false); }

As of the change in strtok()'s handling of empty strings, it is now useless for scripts that rely on empty data to function.

Take for instance, a standard header. (with UNIX newlines)

http/1.0 200 OK\nContent-Type: text/html\n\n--HTML BODY HERE---

When parsing this with strtok, one would wait until it found an empty string to signal the end of the header. However, because strtok now skips empty segments, it is impossible to know when the header has ended.This should not be called `correct' behavior, it certainly is not. It has rendered strtok incapable of (properly) processing a very simple standard.

This new functionality, however, does not affect Windows style headers. You would search for a line that only contains "\r"This, however, is not a justification for the change.

Here is a small function I wrote as I needed to extract some named tokens from a string (a la Google). For example, I needed to format a string like "extension:gif size:64M animated:true author:'John Bash'" into