Editing tm* files (tmPreferences, tmLanguage, tmTheme) just sucks. This post is to convince the developer(s?) to deprecate the tm* files and possibly implement a new language settings system. I tried to structure it somehow well but considering I wrote this over 3 days it might still be a bit complex. Don't hesitate to ask if you can't follow or don't understand something.

Probably the only reason why tm* files are used seems to be backwards compatibility with TextMate. However, the reasons against this are fairly numerous:

It's Property List XML(!). I can't really express my hate on plists. Well, maybe they are easy to parse but they are just unreasonable to edit by hand; and there are many alternatives, e.g. JSON to name one of them. (Note: .sublime-snippets don't count here as they are XML but not Plists and <![CDATA[ ]]> makes escaping redundant.)

JSON is already used for .sublime-settings and both key and mouse bindings so there must be a parser in the core.

The current .tmPreferences system is really awkward. With some testing I found out that filenames don't matter, neither does the "name" key inside. ST only considers the "scope" and "settings" keys which, by the way, forces you to use one file for each scope selector. It would make dead sense to allow multiple of these entries in one file because it makes things more centralized.

They create a .cache file on the drive. Why do you need to do this for XML but not for JSON even if it's said to be parsed easily? Probably related to using a lib but should not be necessary.

All the "uuid"s are not necessary anymore. I don't even know if ST considers them at all because documentation on this part is really poor.

...and probably more but I can't think of any right now.

What this reveals is probably that the plists as they are now should be considered as legacy and replaced by something better. Let me express why I think that YAML could be a great way to enhance this even more.

YAML, which you probably know, is a "human-readable data serialization format". I got to know it when I found the SaneSnippets which implements Snippets with some YAML-like frontmatter and parses data below the second "---" as plain text to remove the need to escape things. This can not really be considered "plain YAML" but it points to the right direction.

So, you probably know the AAAPackageDev package which simplifies the process of developing packages and also includes a variety of syntax definitions for ST-related files (everything that is not tm* or a snippet). Because it uses JSON for the language definitions and converts them into plists you had to use escaping sequences for many characters and this made writing regular expressions really awkward when you have to write "\\\\" to make it match a literal backslash, not to mention all the "".Thus, I thought "why not do that in YAML" and created a still WIP pull request for that. This is what a .YAML-tmLanguage looks like: link (code. Way easier to read, right?

Well, you can use YAML's simplicity for the preferences, too. And you can enhance it while doing so. What I mean respecitively is allowing multiple entries (scopes) in one file and possibly remapping some keys to represent a proper meaning. Additionally, you could turn all the CamelCase identifiers into under_score-separated ones.

Example for Python (ST2) as follows. Please note that I don't completely understand how exactly all these settings work because of poor or greatly hidden documentation, thus I inserted various comments ("#").

# if you want to define symbols without the base "source.python" scope (which I doubt is of any use) # you could probably create a new document and don't specify a "scope" or similar symbols: - selector: meta.function.python exclude: entity.name.function.decorator.python # note: "\S" should be used here since Python also accepts unicode characters as identifiers transformation: s/def\s+([A-Za-z_][A-Za-z0-9_]*\()(?:(.{0,40}?\))|((.{40}).+?\)))(\:)/$1(?2:$2)(?3:$4…\))/g;

- selector: meta.class.python transformation: # this is an alternative representation; I don't know what other "transformations" are supported match: class\s+([A-Za-z_][A-Za-z0-9_]*.+?\)?)(\:|$) replace: $1...

The only hard thing about YAML is to highlight its syntax with regular expressions. This is why I mostly assumed the user writing his YAML properly with blocks in my definition above.The other downside would be that parsing yaml takes a while but unless you have like 30.000 lines it should still be acceptable. AAAPackageDev takes about 70ms to read the syntax definition above on my machine (that is, without the c-acceperation and pure Python).

By the way, all of the above could easily be inserted into the language definition without even having two documents because "settings" is not yet defined for definitions and the "scope" sould be read from "scopeName". Makes things even more centralized.

Of course, evething about this settings system is an example and subject to change but I would really appreciate if a discussion/dialogue about a new settings format for ST established. I am certainly willing to contribute to this, by sharing ideas or by helping with code somehow, because I really like the extensibility of ST and think it could be made even better. Sublime Text 3 could be a great opportunity to make this step because you would have to update your plugins anyway. A package converter should not be a problem here since the data is practically linear.

Regards,Fichte

Last edited by FichteFoll on Fri Feb 01, 2013 10:22 pm, edited 2 times in total.

Perhaps, given that Sublime already supports it fully, JSON would be the best approach for this. I agree that XML is an irritating format for such things and YAML has some nice touches, but leveraging existing capabilities is probably the best way unless there's a really good reason to do otherwise; I can't think of a really good reason to avoid JSON for this purpose.

I woould argue against discarding regular expressions in place of a parser if said parser was incapable of being taught how to parse FORTH sources. Forth is not like ANY other language you have seen. For example, there are no restrictions on what a "function" (called a word) can be named. Some word names that confuse most parsers...

' pronounced 'tick'. This is not an opening single quote for a string['] pronounced 'bracket tick' 1+ 2* etc. These are NOT numbers, they are words.

..and many many more examples of things that traditional parsers are usually incapable of understanding without extreme difficulty.

The problem is that Forth is one of the only languages (maybe even THE only language) that cannot easilly (if at all?) be described in BNF. Forth is not a trditional programming language, it has been described as a programming environment for creating application oriented languages (Leo Brodie, Thinking Forth).

I.E. Forth is used to create a programming language with a syntax that YOU DEFINE! IF the Syntax of the code is defined at compile time how the HELL can any traditional parser parse it. Answer.... usualy it cant. The only editors I have seen that can even approximate the ability to syntax highlight Forth sources have been ones using REGULAR EXPRESSIONS!

mark4 wrote:I woould argue against discarding regular expressions in place of a parser if said parser was incapable of being taught how to parse FORTH sources.

I didn't suggest a parser (that would just replace cancer with aids), I suggested parsers. Or in other words, some form of plugin api that is more capable than today's very limited regular expressions.