Locale parser with fparsec

Localizing an application consists of extracting out user directed text and managing it outside of hardcoded strings in your code. This lets you tweak strings without having to recompile, and if done properly, allows you to support multiple languages. Localizing is no easy task, it messes up spacing, formatting, name/date other cultural information, but thats a separate issue. The crux of localizing is text.

But, who just uses bare text to display things to the user? Usually you want to have text be a little dynamic. Something like

Hello {user}! Welcome!

Here, user will be some sort of dynamic property. To support this, your locale files need a way to handle arguments.

One way of storing contents in a locale file is like this:

ExampleText = Some Text {argName:argType} other text etc
= This is on a seperate newline
UserLoginText = ...

This consists of an identifier, followed by an equals sign, followed by some text with arguments of the form {x:y}. To make a new line you have a new line with an equals sign and you continue your text. When you reach a string with an identifier, you have a new locale element.

But you can also have comments, like

# this is a comment, ignore me!

And to throw a monkey wrench in the problem, you can also have arguments with no types, of the form {argName}.

The end goal, is to be able to reference your locale contents in code, something like

Locale.ExampleText ("foo");

But to get to the point where you can reference this you need to translate your locale files into something workable, kind of like a syntax tree. If you have a working syntax tree of your locale files you can generate strongly typed locale code for you to use in your application.

The data

To parse a locale file of this format I used fparsec. One reason was that it already handles lookaheads and backtracking, and another reason is that I wanted to play with it :)

Going with a data first design, I thought about what I wanted to my final output to be and came up with 3 discriminated unions that look like this:

Fparsec comes with a lot of great functions and parser combinators to create robust parsers. The idea is to combine parser functions from smaller parsers into larger parsers. I liked working with it because it felt like dealing directly with a grammar.

Arguments

Now that I was able to parse words, phrases, and I could seperate out newlines from spaces, lets tackle an argument:

The .>>.? combinator says to apply both combinators results as a tuple, but if it fails to backtrack to the state of the previous parser. Also, the combinator lets you apply parsers as alternatives, so either of the parsers can be applied.

Text elements

Next up is text elements. This is the contents after the = of the identifier, but not including arguments. For example, if our locale entry is

UserLogin = Hey! Whats up?
= new lineezzz

We want to match on “Hey! Whats up?”, followed by an explict newline, followed by “new lineeezz”

Remembering that a phrase is any text except for a start bracket and a newline, we can parse all text up to an argument. New lines are a new line, followed by some space (maybe), followed by an equal sign. Since the newline doesn’t contain any data we care about from the parser we can ignore the output and just assign the result to the union type NewLine using the >>% operator.

But a line is an aggregation of new lines, arguments, and phrases, so we can use the fparsec many operator, along with the 3 alternatives (arguments, text elements, and new lines) to build out an actual line.

An Entry

Since we have arguments, new lines, and text set up, we can finally put it all together. What I need now is to match when we have an identifier (“UserLogin”), an equals sign, followed by a line.

This says if you match a “#” then take the rest of the line (but leave the newline since other parsers will handle that). We might as well maintain the comment information so we can pipe that result to the IgnoreEntry union type.

Running the parser

And now we just have to piece together comments, locale elements, and run the parser