strtok

(PHP 4, PHP 5, PHP 7)

strtok — Tokenize string

Description

stringstrtok
( string$str
, string$token
)

stringstrtok
( string$token
)

strtok() splits a string (str)
into smaller strings (tokens), with each token being delimited by any
character from token.
That is, if you have a string like "This is an example string" you
could tokenize this string into its individual words by using the
space character as the token.

Note that only the first call to strtok uses the string argument.
Every subsequent call to strtok only needs the token to use, as
it keeps track of where it is in the current string. To start
over, or to tokenize a new string you simply call strtok with the
string argument again to initialize it. Note that you may put
multiple tokens in the token parameter. The string will be
tokenized when any one of the characters in the argument is
found.

Notes

Warning

This function may
return Boolean FALSE, but may also return a non-Boolean value which
evaluates to FALSE. Please read the section on Booleans for more
information. Use the ===
operator for testing the return value of this
function.

<pre><?php/** get leading, trailing, and embedded separator tokens that were 'skipped'if for some ungodly reason you are using php to implement a simple parser that needs to detect nested clauses as it builds a parse tree */

As of the change in strtok()'s handling of empty strings, it is now useless for scripts that rely on empty data to function.

Take for instance, a standard header. (with UNIX newlines)

http/1.0 200 OK\nContent-Type: text/html\n\n--HTML BODY HERE---

When parsing this with strtok, one would wait until it found an empty string to signal the end of the header. However, because strtok now skips empty segments, it is impossible to know when the header has ended.This should not be called `correct' behavior, it certainly is not. It has rendered strtok incapable of (properly) processing a very simple standard.

This new functionality, however, does not affect Windows style headers. You would search for a line that only contains "\r"This, however, is not a justification for the change.

@maisuma you invert paramaters of explode() and strtok() functions, your code does not do what you expect.You expect to read the input string token after token so equivalent code for strtok() is arra_filter(explode()) because explode() return lines of empty string when several delimiters are contiguous in the read string, for example 2 whitespaces between.

In fact strtok() is much faster (x2 at least) than arra_filter(explode()) if the read string contains several contiguous delimiters , it is slower if the read string contains one and only one delimiter between tokens.

/** * Tests if there are more tokens available from this tokenizer's string. It * does not move the internal pointer in any way. To move the internal pointer * to the next element call nextToken() * @return boolean - true if has more tokens, false otherwise */public function hasMoreTokens() { return ($this->token !== false); }

Way it works, runs through the string character by character, for each character looking up the action to take, based on that character and its current $state.Actions can be (one or more of) adding the character/string to the current word, adding the word to the output array, and changing or (re)storing the state.For example a space will become part of the current 'word' (or 'token') if $state is 'doublequoted', but it will start a new token if $state was 'unquoted'.I was later told it's a "tokeniser using a finite state automaton". Who knew :-)

Might be pointing out the obvious but if you'd rather use a for loop rather than a while (to keep the token strings on the same line for readability for example), it can be done. Added bonus, it doesn't put a $tok variable outside the loop itself either.Downside however is that you're not able to manually free up the memory used using the technique mentioned by elarlang.

Here is a small function I wrote as I needed to extract some named tokens from a string (a la Google). For example, I needed to format a string like "extension:gif size:64M animated:true author:'John Bash'" into