I've been doing some digging and thinking about the python tool
chain, and given that we have a number of interested python
programmers buzzing around, I think it's more important to get
discussion going. My idea isn't 100% thought through in all
important details so far, but I'm feeling pretty good about my
general idea, so let's get this rolling.

As far as I can see, the big objective for the python toolchain
is to simplify the life and work of content creators. Immediately,
this would require the existing WML checking and some WML formatting,
but later on this topic could branch out, with WML generation,
editor integrations, and probably more stuff. This is good, since
this means this project will be going for quite some time.

Given this, the more immediate goals will be to keep the
current feature set of the WML data tooling. This includes, but
isn't limited to:

Syntactic validation of WML files. This is mostly just parsing the
files.

Semantic validation of WML files after being pre-processed. This
mostly consists of questions like "Is this tag allowed to have this child"
or "Are there spelling errors in this value". This probably also
includes some cross-checking on a WML level, if possible.

Reformatting of well-formed WML files before being pre-processed

If we consider wmlscope (and we should, mid-term) semantic
validation before preprocessor as long as the macros are
sane.

If we're feeling brave, more precise macro analysis might be possible.
This will require some deep meditation with the dragon book, though,
so it's more a nice to have than something we should really plan for.

Thus, given the nature and primitives of wml and it's preprocessor,
this means we need to be quite careful when structuring a solution,
since these are some well-known, but rather tricky compiler problems.

Now, given that, let's talk about the code base and the course of action
I'd propose.

I don't think wmllint is slavageable. Not gonna beat around this.
The code consists of two weird, hairy loops with multiple kinds
of states flying around somehow and I don't even know what.

wmlindent works, however, I'd need to check if we can adapt it
into the structure I'll propose in a second.

I have not yet had time to check wmlscope, but I've heard that it
has non-trivial problems, and I doubt it will be compatible.

I'll need to think about wmliterator.

I don't like how wmlparser2 has a dependency on the wesnoth binary. This
will make testing this software a lot harder, and I'd want the new
wml data suite to be pre-processor aware to be able to perform
macro checking. However, I think we can use wmlparser2's syntax tree
as the representation for the fully expanded wml tree and wmlparser2
itself to kickstart development.

Given that, I'd propose to keep the current state of tools around,
but start a new modern, well-tested python3 tool. This tool should
follow a number of fundamental design decisions:

The tool should use subparsers to structure different commands.
There'd be "wmldat lint", "wmldat migrate", "wmldat indent" and
so on. This makes it easier to use, since there's just one entry
point into the system, and it's easy to extend, since we can just
add more sub commands.

We need to be very clear about separating different tasks. Lint
should be a read-only operation which runs checkers on a tree.
Autocorrect could fix some issues with simple fixes. Migrate
assumes a sane file (or enforces it) and changes
things in a single file. Indent assumes a syntactically sane file
and outputs it again.

Internally, the tool should task or phase oriented and very clear about
the state of the files it is looking at. Thus, for example, most
semantic checks would require a task 'fully-expand-input-wml', which
is guaranteed to provide a fully parsed and preprocessed
AST of the wml we're working on.
Other, braver tasks might just depend on 'preprocessor-parsed' or
'lexxer-created' or something like that, and deal with mostly
unparsed and unreliable files.
This, overall, would give us great flexibility to reuse existing
code in all kinds of ways, and it would enable us to have many
developers work in parallel on the same code base.

Furthermore, I maintain that I'd love to move the project into it's
own repository. This makes it easier to find the project, and we can
use travis to checkout our code. With tox, we could also go ahead and
maintain python2 and python3 compatibility and make upgrading easier.
However, I understand how this is hard and opposed due to current
build chains.

From here, i'd like to see the following happen:

Please discuss this. I don't want to just do something without
backing of the more senior devs.

We need to decide the repository question. Code can't happen
without a repo.

Hopefully there are more python devs here. We'd need to flesh
out some of the more concrete implementation details. After
that, we'd probably start bootstrapping the task architecture,
add in wmlparser2 as a start and branch into linters, checkers
and all of the good stuff to get something going quickly.

Hey, I'm one of the python devs that's been trying to reach you to coordinate on this. Your goals all seem attainable and I'm willing to help out with them.

I'm fully behind your suggestion of moving this code to another repo. Four reasons off the top of my head for moving to a separate repo:

Focus: A dedicated repo will help to clarify the separation between our code and code that is not impacted by our changes.

Structure: In a dedicated repo we will be free to structure code in a way that best serves our code and doesn't have to be weighed against unrelated code.

Minimalism: I shouldn't have to download several GB of assets to work on code that never leverages it.

Attractiveness: You'll have an easier time attracting people to work on that portion of the code you can lower the barrier to entry.

With the robust packaging that python offers, installing tools from a separate repo will be easy for any developer to do.

My own background:

I've been working in python for most of my career, not because my companies natively use it but because I enjoy it so much. I've done all my work in python 2 but I'm not at all concerned about moving to 3 if that's what the project needs.

I have a passion for elegant software design and robustness, achieved through well-understood design practices and rigorous testing. I'm certain I can bring a lot of value to the project. I'd like to meet with other python devs in an IRC session if our schedules can align.

Tetha wrote:I don't think wmllint is slavageable. Not gonna beat around this.
The code consists of two weird, hairy loops with multiple kinds
of states flying around somehow and I don't even know what.

Yesterday I took another look at wmllint. Its main problem (IMO) is that its conversion and sanity check functions are simply too long, and as such they cannot be easily maintained.
A good idea may be to split them in smaller functions and/or moving them to a separate file.

Tetha wrote:Furthermore, I maintain that I'd love to move the project into it's
own repository. This makes it easier to find the project, and we can
use travis to checkout our code.

jstitch wrote:I agree on the own repo for the project, but it should maintain some kind of link with the main wesnoth code, so that content creators use the new tools.

chaverma wrote:I'm fully behind your suggestion of moving this code to another repo.

I disagree with this approach.
From a developer's point of view, having a separate repo is the best solution. If it was only for me, I'd say "what are we waiting to move?!?". But the problem is that these tools are to be used by UMC authors, which may or may not know how to use Github, and which may think "so, I need to install these tools. Then I need to install Python. Well, too much work: forget it", and we'd risk having exactly what we don't want: nobody using them but the mainline devs.
Instead, as things currently are, we can tell to a UMC author "Hey, you need to run wmlindent". He answers "How do I do it?", and our answer is "Install Python, then double click on the GUI.pyw file. That's it".
Besides, some tools (like GUI.pyw and trackplacer) assume that they're installed in the wesnoth/data/tools directory. Granted, since I'm maintaining GUI.pyw it won't be much work for me to add a preferences file and ask for the core directory on the first run, or add a preferences dialog, but this point may apply for some other scripts as well.

chaverma wrote:I'm not at all concerned about moving to 3 if that's what the project needs.

which has the advantage of getting rid of some inconsistencies in the code. Moving is one thing that we'll eventually have to do, as only Python 3 is getting new features (like, for instance, the enum data type or PNG support in Tkinter), and some Linux distros are starting to use it as their default Python version.

Tetha wrote:I don't like how wmlparser2 has a dependency on the wesnoth binary.

The problem, using a parser-based solution, is that we may not be able to keep some of the data contained in the file, like comments - it depends on how the new parser will be implemented, of course. I guess that this is the reason why wmllint uses regexps and string functions instead of a proper parser.
Besides, I never wrote a parser so far, but I guess that this won't be an issue.

chaverma wrote:I'd like to meet with other python devs in an IRC session if our schedules can align.

In my case, keep in mind that I read the logs, and when I'm not in the channel you can always drop me a PM

EDIT: I was almost forgetting a thing. Trackplacer, which is currently broken, uses PyGTK as GUI toolkit. If someone wants to work on it (I must have some half baked attempt somewhere), keep in mind that a refactoring will have to use Tkinter/ttk as GUI toolkit, mainly because this is the default one for Python. It'll still require installation of the Pillow library (BTW, I should update wmlscope to warn about installing the more updated Pillow instead of PIL), but for the average user installing a 1,3 MB library is better that downloading and installing, say, a 38,1 MB one (PyQT5), right?

UMC creator here. I don't know how many are in my boat, but I thought I'd wade in. I don't use most of the tools because I find them too difficult to use, not to install (admittedly I haven't tried for quite a period). For me, if the devs are in an environment that works better for them and lets them build more effective, more user-friendly tools, I think that's a better route than having tools that not everyone can use anyway.

Surely part of the making them user-friendly could be building some kind of tool to help download and install the tools, right? :p

It allows to include a snapshot of another repository inside Wesnoth and merge updates to it. It seems that, besides from merging the subtree from time to time, no changes to the wesnoth repo would be necessary.

I never used it and can't say whether it's the correct solution, but perhaps 8680 (c74d on IRC) knows more about it?

aquileia wrote:If we decide to split the Python tools into their own repo, maybe git subtree could be an option

It allows to include a snapshot of another repository inside Wesnoth and merge updates to it. It seems that, besides from merging the subtree from time to time, no changes to the wesnoth repo would be necessary.

I never used it and can't say whether it's the correct solution, but perhaps 8680 (c74d on IRC) knows more about it?

I use subtrees a lot at my job. And they are definitely an option.

When merging from a subtree, at least two commits are generated on the main repo: the merge one and the commit you are merging from the subtree (and using squash it can always be reduced to one commit here)

Hey, I'm one of the python devs that's been trying to reach you to coordinate on this. Your goals all seem attainable and I'm willing to help out with them.

Yup, noticed you. Sadly, the plan to stumble into you on IRC isn't working out as intended.

The problem, using a parser-based solution, is that we may not be able to keep some of the data contained in the file, like comments - it depends on how the new parser will be implemented, of course. I guess that this is the reason why wmllint uses regexps and string functions instead of a proper parser.
Besides, I never wrote a parser so far, but I guess that this won't be an issue.

This is mostly why I think we need a phase- or task-based approach.

If you need to retain as much from the input file as possible, and potentially rewrite the input file just a little bit, you'll want to hook into the very early lexical phase. Whitespace- and comment-aware lexers are possible. So for example wmlindent would just consume the token stream from the lexer, ignore input whitespace tokens and write the rest to an output formatted correctly.

On the other hand, if you want to do matching on the tree structure, there's no need to worry about whitespace and comments. You need to be able to find tags and structures of tags, which can be greatly simplified with a strong AST in the background. And the last times I've worked on systems like this.. you'll end up with checks where you love the strong AST, because it's hell to write the check even with that.

I'd like to meet with other python devs in an IRC session if our schedules can align.

During the week I'm available at around 2000 - 2200 UTC+2. On weekends, I can stretch that somewhat.

Instead, as things currently are, we can tell to a UMC author "Hey, you need to run wmlindent". He answers "How do I do it?", and our answer is "Install Python, then double click on the GUI.pyw file. That's it".

In combination with UnwiseOwl's comment, this actually seems more like a user story to my. "As a content developer, I want a simple and easy way to setup a development environment with all tools available".

In fact, why'd you even need to run wmllint manually? We could go ahead and beef up GUI.pyw with inotify or equivalent tools to just run wmllint if you save files, so you get immediate feedback about the state of your code.

Tetha wrote:
In fact, why'd you even need to run wmllint manually? We could go ahead and beef up GUI.pyw with inotify or equivalent tools to just run wmllint if you save files, so you get immediate feedback about the state of your code.

Tetha wrote:In fact, why'd you even need to run wmllint manually? We could go ahead and beef up GUI.pyw with inotify or equivalent tools to just run wmllint if you save files, so you get immediate feedback about the state of your code.

Problem of distribution of <anything> can be easily solved by creating web service, which will do the job.

Tetha wrote:In fact, why'd you even need to run wmllint manually? We could go ahead and beef up GUI.pyw with inotify or equivalent tools to just run wmllint if you save files, so you get immediate feedback about the state of your code.

Problem of distribution of <anything> can be easily solved by creating web service, which will do the job.

Until the internet goes out

99 little bugs in the code, 99 little bugs
take one down, patch it around
-2,147,483,648 little bugs in the code

chaverma wrote:I'm fully behind your suggestion of moving this code to another repo.

I disagree with this approach.
From a developer's point of view, having a separate repo is the best solution. If it was only for me, I'd say "what are we waiting to move?!?". But the problem is that these tools are to be used by UMC authors, which may or may not know how to use Github, and which may think "so, I need to install these tools. Then I need to install Python. Well, too much work: forget it", and we'd risk having exactly what we don't want: nobody using them but the mainline devs.

There is a tradeoff in your argument not mentioned: that keeping the status quo hinders the tools' progress on the fronts of correctness, usability and robustness. All of these contribute to that undesirable end state of "too much work, forget it."

Elvish_Hunter wrote:
Instead, as things currently are, we can tell to a UMC author "Hey, you need to run wmlindent". He answers "How do I do it?", and our answer is "Install Python, then double click on the GUI.pyw file. That's it".

There's no reason we can't store an artifact from the separate repo in the main repo in a predictable place, whose interface can be as simple as the double-click. It can yield the same usability standard you've established. Or using git subtrees as aquileia suggests.

One of the main problems with wmllint and wmlscope in my opinion always was the point that they are using their own ways of preprocessing wml, redundant code to the C++ preprocessor, with a lot of bugs/differences. wmllint is unable to spellcheck translatable strings hidden in macros for instance. Couldn't they call the wesnoth executable as a subprocess to use the C++ preprocessor!? wmllint should operate on the --preprocess output.

As for trackplacer being broken...I used to be able to use it on various Linux systems, but not on windowses. There was some problem about the track markers not appearing in the UI IIRC. IIRC I reported it to esr once, it must have been on IRC, and he said something about a bad library, judging from the console messages. IIRC the problem seemed to be upstream (pygtk or gtk), not trackplacer itself.