François-Xavier Cat

PowerShell - Remove Diacritics (Accents) from a string

2015/05/21 |
1 minute read
|

In the last few days, I have been working on a Onboarding automation process that need to handle both French and English and one of the step needed to remove the accents (also knows as Diacritics) from some strings passed by the users.

The following approach uses String.Normalize to split the input string into constituent glyphs (basically separating the “base” characters from the diacritics) and then scans the result and retains only the base characters. It’s just a little complicated, but really you’re looking at a complicated problem…

UPDATE: Thanks to Marcin Krzanowicz who provided another solution, see the Method 2 below. His version works with Polish characters too where the method 1 doesn’t.

Method #2 (From Marcin Krzanowicz)

Extra: Remove Diacritics from multiple files

If you want to take this to the next level and remove diacritics from multiple files, you could do something like this:

# Modify the function to make it compatible with the pipelinefunction Remove-StringLatinCharacters
{PARAM([parameter(ValueFromPipeline =$true)][string]$String)PROCESS{[Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($String))}}# Exemple with multiple Text files located in the directory c:\test\Foreach($filein(Get-ChildItem c:\test\*.txt)){# Get the content of the current file and remove the diacritics$NewContent=Get-content$file | Remove-StringLatinCharacters
# Overwrite the current file with the new content$NewContent | Set-Content$file}