Parameters

source

The PHP source to parse.

Return Values

An array of token identifiers. Each individual token identifier is either
a single character (i.e.: ;, .,
>, !, etc...),
or a three element array containing the token index in element 0, the string
content of the original token in element 1 and the line number in element 2.

Examples

/* Note in the following example that the string is parsed as T_INLINE_HTML rather than the otherwise expected T_COMMENT (T_ML_COMMENT in PHP <5). This is because no open/close tags were used in the "code" provided. This would be equivalent to putting a comment outside of <?php ?> tags in a normal file. */$tokens = token_get_all('/* comment */'); // => array(array(T_INLINE_HTML, '/* comment */'));?>

User Contributed Notes 5 notes

1 : bug line numbers Since PHP 5.2.2 token_get_all() should return Line numbers in element 2.... but for instance (5.3.0 on WAMP), it work perfectly only with PHP code (not HMTL miwed), but if you have some T_INLINE_HTML detected by token_get_all() , sometimes you find wrongs line numbers (return next line)... :(

2: bug warning message can impact loopsWarning with php code uncompleted (ex : php code line by line) :for example if a comment tag is not closed token_get_all() can block loops on this warning :Warning: Unterminated comment starting line

This problem seem not occur in CLI mod (php command line), but only in web mod.

Waiting more stability, used token_get_all() only on PHP code (not HMTL miwed) :First extract entirely PHP code (with open et close php tag), Second use token_get_all() on the pure PHP code.

3 : Why there not function to extract PHP code (to extract HTML, we have Tidy..)?

I wanted to use the tokenizer functions to count source lines of code, including counting comments. Attempting to do this with regular expressions does not work well because of situations where /* appears in a string, or other situations. The token_get_all() function makes this task easy by detecting all the comments properly. However, it does not tokenize newline characters. I wrote the below set of functions to also tokenize newline characters as T_NEW_LINE.

I'm sure you can figure out how to count the lines of code, and lines of comments with these functions. This was a huge improvement on my previous attempt at counting lines of code with regular expressions. I hope this helps someone, as many of the user contributed examples on this website have helped me in the past.

Complementary note to code below:Note that only the FIRST 2 (or 3, if needed) array elements will be updated.

Since I only encountered incorrect results on the FIRST occurence of T_OPEN_TAG, I wrote this quick fix.Any other following T_OPEN_TAG are, on my testing system (Apache 2.0.52, PHP 5.0.3), parsed correctly.

So, This function assumes only a possibly incorrect first T_OPEN_TAG.Also, this function assumes the very first element (and ONLY the first element) of the token array to be the possibly incorrect token.This effectively translates to the first character of the tokenized source to be the start of a php script opening tag '<', followed by either 'php' OR '%' (ASP_style)