Suppose you have a set of HTML documents generated by populating the same template with the data from some kind of database.
HTML::Untemplate is a set of command-line tools ("xpathify",
"untemplate") and modules (HTML::Linear and it's dependencies) which assist in original data retrieval.

They all point to the same node, however, their verbosity/readability vary. The strict mode specifies tag names and positions only. Disabling strict will use additional data from CSS selectors. Shrink mode attempts to find the shortest XPath unique for every node (/html/body is shared among almost all nodes, thus is likely to be irrelevant).

The keys are in XPath format, while the values are respective content from the HTML tree. Theoretically, it could be possible to reassemble the HTML tree from the flat key/value list this tool generates.

The untemplate tool flatterns a set of HTML documents using the algorithm from xpathify. Then, it strips the shared key/value pairs. The "rest" is composed of original values fed into the template engine.

And this is how the result actually looks like with some simple real-world examples (quotes 1839 and 2486 from bash.org):