Hi, I'm using DiffMerge to compare Json files that we use to update the text content on our web portal and having problems with UTF-8 and special characters.

For example, in the last Json file I made on Windows, Notepad++ (Windows), TextEdit (Mac) and DiffMerge all see the text "Curaçao".

But when I copy the contents of this Json into a new text document on Mac using TextEdit, DiffMerge sees that same text as ""Cura√ßao", so it's spotting a lot of discrepencies between different versions of the Json doc, making updates to our website difficult!!

jclausius wrote:All of this will be dependent on what is stored in the Byte Order Mark (BOM) in the beginning of your file. Do you know if one is present?

Next, take a look at the individual Rulesets for the extensions of your configuration files. Is the BOM option checked in the Ruleset? Perhaps you need define or try a specific encoding?

Thanks for your reply. The Json file doesn't contain a BOM in the beginning of the text as far as I can tell, I'll look into the BOM option in ruleset. If I my colleague compares the two Json files on Windows in DiffDog, they are identical, but for me on Mac in DiffMerge, those special character discrepancies are showing up.

a) Make a copy of both files you are working with, but rename the files so the copies extensions end with '.utf8'. For example, 'somefile.json' would be 'somefile.utf8'.

b) Run these two *.utf8 files in DiffMerge. What does the diff look like for these two files?

1. I downloaded our current json file. Opened it in Text edit and copied the entire contents to a new document saved with UTF-8. Compared the two in Json and the copy still displays "Cura√ßao" while the original displays "Curaçao".

2. I made a duplicate of both and changed the file extension to ".utf8". Comparing them in Diffmerge still shows differences in how special characters are displayed.

If you take the 1st set of files (files encoded with the byte order marks), those use the correct encoding on Windows, Mac, and Linux. However, if a file is missing the BOM, it will depend on what character set ends up displaying the text. The default character set on the Mac apparently doesn't like the 'ç'.

Try to open the four files on the Mac. The set of QuizKid1.utf8 and QuizKid2.utf8 do not open because they do *not* contain any byte order marks. The set of QuizKid3.utf8 and QuizKid4.utf8 do open correctly.

Now try to this. Change the option of the ruleset which applies to json to 'Ask for Each File in Each Window' for the character encodings. Also, rename the file extesions so they all end in .json (ie QuizKid?.json). Try again in DiffMerge. When 'asked', choose 'Western European'. Now try all 4 files. Do they now display correctly? You should notice the QuizKid3.json/QuizKid4.json does not prompt for an encoding because it has identifying BOM marks.

A couple of suggestions:

- Switch the editor you use to one that will properly insert BOM to the UTF-8 file- Configure a different encoding with your ruleset for the files.

If you take the 1st set of files (files encoded with the byte order marks), those use the correct encoding on Windows, Mac, and Linux. However, if a file is missing the BOM, it will depend on what character set ends up displaying the text. The default character set on the Mac apparently doesn't like the 'ç'.

Try to open the four files on the Mac. The set of QuizKid1.utf8 and QuizKid2.utf8 do not open because they do *not* contain any byte order marks. The set of QuizKid3.utf8 and QuizKid4.utf8 do open correctly.

Now try to this. Change the option of the ruleset which applies to json to 'Ask for Each File in Each Window' for the character encodings. Also, rename the file extesions so they all end in .json (ie QuizKid?.json). Try again in DiffMerge. When 'asked', choose 'Western European'. Now try all 4 files. Do they now display correctly? You should notice the QuizKid3.json/QuizKid4.json does not prompt for an encoding because it has identifying BOM marks.

A couple of suggestions:

- Switch the editor you use to one that will properly insert BOM to the UTF-8 file- Configure a different encoding with your ruleset for the files.

Thanks for all your help. As I didn't have any more time to figure this out, I ended up switching to a different program, which didn't have this issue.