genstrings2

Actually I am on vacation, but I couldn’t help myself using the breather to work on a little hobby project. This I shall reveal to you today!

If you localize your apps (iOS or Mac regardless) you are probably used to working with genstrings. Probably painfully working, which is why we built the Mac app Linguan which remote-controls genstrings and merges the results with your previously existing tokens.

But genstrings has several big problems.

BSA Banner

The most pressing problem of using genstrings from Linguan is that it is very picky about the syntax of the string macros. If you have a parameter for the macro that is not a literal @”string” then it will crash. Several Linguan users have been stumped by this behavior because this makes it impossible for Linguan to scan their source code.

The second problem is that we don’t have access to genstrings source code, and so we neither cannot know what this tool really does, nor can we modernize it to make use of modern multi-processor systems. genstrings is single-threaded.

The third problem is that before now the only way how we could call genstrings from within Linguan is basically open an invisible shell and then read the output files from some hidden temporary output folder. You get genstrings with Xcode, but what if you don’t want to install Xcode on the machine you want to use with Linguan? Having to spawn a process for each source file individually has also proven to be impracticable because instead of a 0.3 seconds this would take 12 seconds for scanning just the Linguan source files.

If that is any indication then the genstrings man page has not been updated since May 7, 2007. Any substantial improvements to the tool are probably even older than this, somebody from Apple told me that this goes back as long as the NeXt days.

My goal was to reverse-engineer the functionality of genstrings, put that into a static library and thus be able to bring native high-performance string scanning to Linguan without the need for genstrings to be present any more.

My latest Open Source project DTLocalizableStringScanner (on GitHub) contains two targets: one for a static library which we can use for scanning the source files. The other is a re-implementation of a command line tool which can take the place of genstrings.

genstrings2 is built around NSScanner for the scanning and GCD queues for multi-threading. There might be faster ways to scan for the macros but there is no reason to do that at the expense of the code being easy to read. I’d rather fill up a Grand Central Dispatch queue with one block per source file and thus make full use of multiple CPU cores. It works on as many files in parallel as GCD permits.

You can help hone this into a worthy successor for genstrings by testing it on your own source code and telling me about edge cases that it does not handle properly. For one thing it should ignore invalid macros instead of crashing like the original, that’s one immediate advantage that you get from symbolic-linking genstrings2 to be used instead of genstrings.

There is one (undocumented) behavior that is not implemented yet. genstrings apparently expands tokens containing lists in the format [one, two, three] into three lines with [one], [two] and [three]. People would use this to localizes NSPredicateEditor. But that’s an easy exercise that I will do in the next few days.

If there are no show stoppers then DTLocalizableStringScanner will replace genstrings already in Linguan 1.0.3.