If we ever fix things on the C side of mcedit, or modify a syntax definition file, we'll have a problem: since we don't have a collection of sample files, in the various syntaxes, to test our fixes against, we (the maintainers) would have to create these sample files ourselves. And we'd have to create good files: such that demonstrate every nook and cranny in the syntax definitions.

This is a lot of work, so I suggest we start small: have just one or two sample files for now, close this ticket, and add more sample files as time goes by.

To alleviate this burden we ought to make a rule:

Any new syntax definition must be contributed together with a sample file(s). The people writing the syntax files know their language best, so they're the ones who should provide the samples.

Change History

For documentation purposes, we should also collect files that show imperfections/bugs in our syntax definitions (we could embed the string ".fail." in their filenames). This does not imply that we're intending to fix these imperfections.

This ticket does not deal with the testing code itself (regression test; this can be very easily implemented when/if mc has scripting support; mc2 proves this).

I have to say that I'm always on the side of more tests, but in this particular case I can't help asking if we have a larger problem.

What I mean by that is the last time I had a look at the syntax highlighter code, I've almost got a heart attack. It's compact and ingenious, and it's been actually working for a very long time, but it's all but easily understandable, well documented and properly tested. On top of that it has some genetic deficiencies, like the nested quoting bug. We also ended up having a whole library of highlighting rules, which as you correctly mention are not tested, but also not really maintained.

I've been thinking about it for quite awhile and my thoughts are that we aren't the first project attacking this problem, and there are tons of libraries for that purpose. To name only few I personally used in the past:

Of course, there are good arguments against introducing a dependency on a syntax highlighting library, but maybe there is some middle ground like implement a minimalist engine and automatically generate syntax files from e.g. Pygments collection...

Just thought I'd raise the point before you invest substantial amount of time in testing of the existing highlighter, even though a test corpus would be useful irrespectively of whether highlighter will gets replaced or not.

It's not very well maintained (it seems that Igor gave up on it a long time ago), there are not so much syntax definitions available, they are mostly not up to date, the definition syntax is blood chilling, and the engine is written in C++ with a dependency on Apache Xerces.

I would really rather look in the direction of GtkSourceView and/or Scintilla.

It's not very well maintained (it seems that Igor gave up on it a long time ago), there are not so much syntax definitions available, they are mostly not up to date, the definition syntax is blood chilling, and the engine is written in C++ with a dependency on Apache Xerces.

It being C++ isn't even the worst part of it :-/ For once, I couldn't find any embedding documentation / API for that one, and it doesn't seem to support incremental highlighting, etc. Apparently it's really geared towards whole-file colorization and thus I don't think it's suitable for integration with the editor, at best, one could try to use it for the viewer to generate colorized version using ANSI output...

so it seems that re-implementing the kate highlighting engine (now KSyntaxHighlighting) is popular - apart from the haskell-based skylighting mentioned above (and its predecessor highlighting-kate), qt creator also did it (with c++ again), as did Syntax::Highlight::Engine::Kate in perl. with so much code around to rip off from be inspired by, it would be a shame not to re-implement it again, this time in plain c. :D