Diacritics

As metadata spreadsheets exchange hands and are often even created from diverse sources, issues arise regarding diacritics. These characters often do no translate from encoding to encoding, creating poor results in the resultant MODS metadata files.

Encoding problems will appear as black rectangular three letter "blocks" in Notepad++ or diamond shaped question marks in Archivist Utility. There are a few ways to deal with this

Note:

Method one is the easiest but requires two programs (ExcelConverter and Notepad++)

Method two requires OpenOffice Calc, and it's a bit fiddly

Method three works fine and is close to our usual workflow (Excel and Notepad++), but it is way too labor-intensive for anything beyond a stray diacritic or two

Repair Method One: Use ExcelConverter

Excelconverter is located here S:\Digital Projects\Administrative\scripts\ExcelConverter .
It is also located here File:ExcelConverter.txt after downloading change the extension to .pl

Run the ExcelConverter script, choose the input file and export location, and click the Convert File! button -- the script defaults to exporting as unicode

Once the file has exported, open it in Notepad++

From the Encoding menu, select Encode in UTF-8 without BOM

Save and close

if you dont save exported unicode file as UTF-8 without BOM it will not work in the archive and upload process

Excel converter is also used to export any spreadsheets into standard tab delimited files. it removes diacritics and cell quotations that excel creates when saving string information inside a regular cell.

Repair Method Two: Use OpenOffice Calc

Open the file in OpenOffice Calc

From the File menu, select Save As...

In the Save dialog window

Uncheck Automatic file name extension

Check Edit filter settings

Change Save as type to Text CSV

Manually change the extension in the File name box from .csv to .txt

Click Save

In the Export Text File window

Change Character set to Unicode (UTF-8)

Leave Field delimiter as {Tab}

Make Text delimiter blank (you'll have to backspace over it manually)

Don't change the check boxes

Click OK

Repair Method Three: Use Excel

Open the file in Excel

Use Excel's built in character map to replace all found problems in the Excel file

To access the character map, follow this path: Insert tab - Symbols group - Symbol

Select the character you need to replace and make sure "Unicode Hex" is selected in the dropdown