I'm trying to create a web tool that can visualize the differences between two XMLs. difflib was working pretty well in creating html with the differences, but then some unicode text showed up in the XMLs and the resulting html now contains html-encoded letters.

1 Answer
1

i assume what bothers you are 'html character entities', not their numerical counterparts. you may re-map them e.g. by means of your favorite cli tool supporting regexes (eg. sed) and the tables from unicode e-workers or the reference. the numerical entity encoding may be used in html and xml files alike.