Here-docs are incredibly handy when writing Perl, but incredibly tricky when parsing it, primarily because they don't follow the general flow of input.

They jump ahead and nab lines directly off the input buffer. Whitespace and newlines may not matter in most Perl code, but they matter in here-docs.

They are also tricky to store as an object. They look sort of like an operator and a string, but they don't act like it. And they have a second section that should be something like a separate token, but isn't because a string can span from above the here-doc content to below it.

So when parsing, this is what we do.

Firstly, the PPI::Token::HereDoc object, does not represent the << operator, or the "END_FLAG", or the content, or even the terminator.

It represents all of them at once.

The token itself has only the declaration part as its "content".

# This is what the content of a HereDoc token is
<<FOO
# Or this
<<"FOO"
# Or even this
<< 'FOO'

That is, the "operator", any whitespace separator, and the quoted or bare terminator. So when you call the content method on a HereDoc token, you get '<< "FOO"'.

As for the content and the terminator, when treated purely in "content" terms they do not exist.

The content is made available with the heredoc method, and the name of the terminator with the terminator method.

To make things work in the way you expect, PPI has to play some games when doing line/column location calculation for tokens, and also during the content parsing and generation processes.

Documents cannot simply by recreated by stitching together the token contents, and involve a somewhat more expensive procedure, but the extra expense should be relatively negligible unless you are doing huge quantities of them.

Please note that due to the immature nature of PPI in general, we expect HereDocs to be a rich (bad) source of corner-case bugs for quite a while, but for the most part they should more or less DWYM.