I started to translate an IF game to my language. I’m a beginner in that. My language has some extra characters. As I knew, in the newer IF compilers/interpreters, it is not a problem to use unicode characters. But when I try to compile the (partially) translated source, I get the following error message:

The grammar token 'unicode 337' in the sentence 'Understand "kér [something]t [someone]t[unicode 337]l" as querysmalling' looked to me as if it might be a unicode character, but this isn't something allowed in parsing grammar.

In the first word, there is an unicode character, but it is not a problem. I tried to google it, and it seems, the compiler only allows unicode characters with smaller code number. Is that true? Can I avoid that, and use my special characters somehow?

It looks to me as though the issue is this, from §5.10 of Writing with Inform:

The world has a bewildering range of letters, accents, diacritics, markers and signs. Inform tries to support the widest range possible, but the works of IF produced by Inform are programs which then have to be run on a (virtual) computer whose abilities are more constrained: few players will have an Ethiopian font installed, after all. So a degree of caution is called for.

(a) Definitely safe to use. Inform’s highest level of support is for the letters found on a typical English typewriter keyboard, including both the $ and £ signs (but not the Yen or Euro symbols ¥ and €), and in addition the following:

The other Unicode characters can be written inside quotation text but not source text–which I’m guessing means they can’t be understood either. So é can be understood but unicode 337 can’t.

Unfortunately I suspect there isn’t a workaround for this–there’s some internal representation in a format that doesn’t include the Unicode characters (the ZSCII format). I’m not super familiar with the inner workings of the virtual machines though.

Indeed, nothing outside that range can be properly handled by the parser. This is a significant problem when trying to write IF in different languages, since the limited range shown above isn’t even enough for the entire European Union. (Greek, for instance, is missing its entire alphabet, while other languages have more subtle problems: Polish needs letters like ż, Romanian ă, Icelandic ð…it looks like you’re specifically missing Hungarian’s ő?)

Zarf has written an extension that updates the parser to support Unicode. But since you can’t use most Unicode characters in object names or Understand lines, you need to use Inform 6 inclusions for all parsing-related code (Understand lines, object names, verb definitions, conversation topics…).

Hopefully an upcoming release of Inform 7 will change this. But for now, it’s not really possible to use it for works in most non-English languages. Sorry about that.

That said, modern systems and interpreters do support Unicode quite well. If you managed to get a Hungarian game past the first stage of compiling, everything else would go off without a hitch, and it would be completely playable. The only problem is the ni compiler itself, which is also the one part that’s not open source (as opposed to the GUI, the template library, the I6 compiler, the blorb tools, the Glulx format, the Quixe interpreter…).

Thanks for all the answers.
Yes, I would like to translate to Hungarian language. I know an old Hungarian IF game for C64, what was rewritten to I6, and it has unicode characters… I wrote its author, how he did it.
I downloaded an I6 source, wrote some special characters in it, and tried to compile it with the inform compiler, with -v8 flag, but it gave error messages for the spec characters… I also tried the -C2 flag, but it didn’t help.

Draconis:

That said, modern systems and interpreters do support Unicode quite well. If you managed to get a Hungarian game past the first stage of compiling, everything else would go off without a hitch, and it would be completely playable. The only problem is the ni compiler itself, which is also the one part that’s not open source (as opposed to the GUI, the template library, the I6 compiler, the blorb tools, the Glulx format, the Quixe interpreter…).

Honestly, if ni could just be hacked to pass non-ZSCII characters through unmolested, then all the necessary transformations could be applied on the I6 side. This might be possible with disassembly, but might not: it depends on the data structures used internally. (Ideally it would just use UTF-8 in byte arrays, and depend on the I6 compiler to handle character sets, but I don’t know if this actually happens.)

I downloaded an I6 source, wrote some special characters in it, and tried to compile it with the inform compiler, with -v8 flag

You need to use the -G flag (for Glulx), and -Cu (to indicate that the I6 source code is in UTF-8).

Then you need additional settings to get the I6 dictionary to be Unicode-compatible. I don’t have a complete example on hand, unfortunately.

It is very adventurous It turned out that my inform compiler doesn’t support unicode, because on Linux, the version is 6.31. I downloaded the inform 6.33 Windows executable, and I use it with wine.
So, I make the english-hungarian translation in gnome-inform7. Save, then quit. Recode őűŐŰ in story.ni to their ugly iso8859-1 version. Then I reopen the project in gnome-inform7, compile it, and quit again. I recode auto.inf. It wasn’t straightforward, because the auto.inf wasn’t utf8… I got an error. It recommended using DICT_CHAR_SIZE=4, so I appended

!% $DICT_CHAR_SIZE=4

to the first line of the auto.inf.
Ok, now it compiles, but when I play, I get errors for every command, even for quit…
Any idea, where I can find additional help, how to solve that?

DICT_CHAR_SIZE=4 is the correct flag, but as soon as you use it, you have to replace the I6 parser code. The existing parser assumes that the dictionary and all player input is stored in bytes. You have to replace that with code which uses (32-bit) words.

In theory you can just include that extension into the Inform 7 project. Then write your game using replacements for all the characters ZSCII doesn’t have: for instance, Hungarian doesn’t use ô or û, so you could write ô û anywhere you need ő ű. Then finally, use sed to replace all ô with ő and û with ű in auto.inf before passing it to the Inform 6 compiler.

In practice I’ve never done this so things might break. But it seems promising! Especially for a language like Hungarian which doesn’t need very many “exotic” characters. I think ő and ű are the only ones not in the original character set.

If you want to be really fancy, and make it easy to type your source on a Hungarian keyboard, you could sed ő ű into ô û before invoking ni, then sed them back afterward to pass to the Inform 6 compiler. Depends how much you want to fiddle with the build process.