server-side gadgets, as suggested by Trevor, and for inline scripting.

The idea of having a single scripting language that can be used by our

tools/gadgets community is certainly appealing, and the ever-growing

ecosystem around both client-side and server-side JS is as well.

Tim's reply:

I agree that server-side JS would be ideal from a user perspective,
which is why I've researched the various compiler/interpreter options
for it in some detail. I've settled on Lua as my preferred solution
despite its relative unfamiliarity among our users because:

All memory allocation, including stack space, is done via a configurable hook function. Thus memory accounting can easily be implemented (I've done it already). Infinite recursion does not lead to a segfault, and there's no chance of a user script sending a server into swap.

It's fast. The interpreter is fast, and there's a mature JIT compiler which is very easy to integrate. Code compiled with the Lua JIT compiler executes much faster than code written in PHP running under Zend. Execution speed is critical for the citation application, which has a lot of code executed very often.

It's designed to be integrated, so development time is very small. The C interface is well-documented.

It can be embedded in the same address space as PHP while still allowing memory and CPU limits. I've benchmarked PHP -> Lua and Lua -> PHP calls at around 0.5us on my laptop, which is at least an order of magnitude faster than any IPC solution.

Lua is intended to be easy to learn, and easy to use for short scripting tasks. It has a feature set very similar to JavaScript, with first-class functions and prototype-based OOP.

Of the potential JavaScript solutions, Rhino is the one I liked the
most, and spent the most time talking with Brion about. The problem
with it is that it's not feasible to embed it in the same address
space as PHP, and startup time is very slow, so you'd have to
implement it as a standalone daemon listening on a TCP or Unix socket.
That architecture would be complex to develop, would have poor
performance compared to Lua, and would probably make it impossible to
implement memory limits via control of the Java heap size.

Neither SpiderMonkey nor V8 give you the ability to control memory and
stack usage. They're both difficult to embed due to poor documentation
and the relative scarcity of embedded implementations. The developers
are very focused on the browsers that they primarily support, and less
so on a broader ecosystem of end users.

-- Tim Starling

Trevor's reply:

These are all excellent reasons, and it's clear you've done some solid research, but I can't help but feel that the user experience would be taking a back seat to ease of integration of we use Lua. I feel strongly that having a one language system shouldn't really be a nice-to-have, it should be a requirement, especially if we are talking about introducing yet another language.

...Make the syntax of [any] template language identical to JavaScript....

There's also an abandoned but mostly functional JavaScript interpreter written in PHP out there[1]. I realize that it would not be as fast as running programs in V8 or Rhino, but if we are talking about replacing complex templates with JavaScript code, we are still likely to see a dramatic improvement even with a slower interpreter because people will be able to express themselves much simpler ways. An added bonus is that that MediaWiki doesn't add any dependencies because it's all done in PHP code.

I have use Node.js a lot, and know of modules such as node-sandbox[2] which provide some of the limitation and isolation that is needed. Some JavaScript to PHP bridging will need to be added of course, and it's not 100% clear how to best do that, but I'm sure it's possible to do.
It's also very likely that JavaScript code running on Node.js is going to start being used on the server side in our infrastructure in the near future. Chat, real-time collaboration, and other websocket/long-polling communication services are natural uses of Node.js, and nearly impossible to do in PHP, even with a very small amount of users.

Hopefully we can come up with something that can take advantage of the rising popularity and familiarity of JavaScript to make our wiki easier to learn and use.

Tim:
.....
I think there's a risk that the citation templates would actually
execute more slowly in an interpreter running on top of PHP, such as
WikiScripts or this JavaScript implementation, than they do currently.
The nature of the code is such that it puts quite a lot of performance
pressure on the executor.

I'm anxious to see benchmarks of the citation templates converted to
run in WikiScripts, because I think they'll be shockingly bad and will
vindicate my approach.
There's no memory limit. It's just a few lines of wrapper JavaScript
and a wall clock timer. It's trivial to write a script that fills up
all memory and sends a server into swap. That's not a failure mode we
want, from a sysadmin perspective.

The stability of the site is the most important consideration. We have
a reach of 400 million people and a metatemplate editor community of
about 10. So I reckon approximately 99.99999% of our users care more
about whether the site is up than whether we use Lua or JavaScript.

-- Tim Starling

Michael Dale:
Some thoughts:

Has a php based interpreter for Lua been written? Would the template
language make mediaWiki incompatible for vanilla php based installs? WMF
would ( of course ) be running some native embeddable interpreter, but
the idea of a php based fall-back seems attractive.

Has a JavaScript Lua interpret been written? Would browser based rich
text editors need to include a Lua interpreter of some kind? Would the
existing wikitext backwards compatibility be obtainable as "near
hanging fruit" per the rich text editor efforts that ~has to~ run in
JavaScript? Has the JavaScript wikitext parser work been compared to its
php counterparts? Is there any possibility of crossover development
efforts there?

How do Lua based libraries for CSS DOM / HTML / XML traversal and
manipulation compare to JavaScript based libraries?

Is there really a risk that Rino, v8 or IonMonkey based JavaScript JIT
would be slower than the existing php based template system? It may be
the startup time and memory management of JS is less flexible in the
current ecosystem of tools. Will that hold true into the future? If Lua
has better embeddable performance characteristics right now, does that
mean its "better suited"?

Do we want to preserve the class of "template editor community" to small
numbers of individuals? Can we run tests of any kind to help compare
ease of addressing traditional and foreseen needs of server side wiki
scripting comparing JS to Lua?

While the accessibility and "ease of use" characteristics are harder to
evaluate and test than direct embed time and memory constraints
characteristics it seems performance characteristics should inherently
play a back seat to user accessibility, and crossover development since
performance constraints can be addressed with predictable engineering
efforts while less accessible or less 'well understood' language can
adversely effect contributions and development times in ways that are
not easy to directly address on a mass scale by "just adding hardware".

If WMF needs to allocate X times as much RAM and Y times more CPU it
seems like a more predictable cost than teaching of people Lua. Unless
the development costs of implementing JavaScript wiki script somehow
greatly outweigh all these variable accessibility and cross development
costs for a larger set of individuals, it seem like JavaScript would be
the preferred solution.

There seems to be so much momentum and network effects around JavaScript
that I would think the debate would be around "how" to implement
JavaScript as the next wiki script language not "if" it should be the
next wiki script language.

peace,
--michael

Tim:
No.
MediaWiki installations on shared hosts and the like can support Lua
by installing the standard interpreter binary (say by FTP upload) and
shelling out to it. Support for such a scheme is already implemented,
it was done by Fran Rogers in 2008.

No. However, there is an incomplete Lua to JS translator.

I don't see why that would be necessary.
There's no DOM manipulation involved in the target application, so I
don't see why that would be a concern.

No. I said there was a risk that an interpreter written in PHP and
running on top of Zend may be slower than the existing PHP template
I'm not a futurologist. I am, however, tired of waiting for the
perfect solution to magically appear.

No.

The design of the interface between the scripting language and PHP
will have to be done with input from the people who write templates
currently. The feature set of JavaScript and Lua is pretty much
identical, so it's hard to imagine how testing could identify
something that favours one language over the other.

It doesn't matter how much RAM you buy. If there's no limit on how
much RAM a script uses, then it will be able to exhaust all available
resources.

It doesn't matter how many cores you have: script execution will run
on a single core, and users will have to sit around waiting while
execution completes.

That assumes we have enough development time to spend on implementing
the features we need inside some JavaScript compiler. We don't have an
hour of development time to spend for every hour of editor time we
save, because there are more editors than developers.

-- Tim Starling

Erik:

Tim, many thanks for the detailed explanation of why you chose to go
for a Lua prototype implementation. This as well as the other comments
in this thread has been hugely valuable to me.

I've looked at the prior wikitech threads and I haven't seen these
specific arguments there, so I'm guessing this (as well as some of the
other considerations mentioned in this thread) would be valuable to
share either on-list or on-wiki. I'd also suggest that detailed
technical discussion take place there so more folks have a chance to
weigh in or write code to prove people wrong. :)

My tentative takeaways so far (which I'm happy to post to a relevant
public thread as well):

1) IMO this would be useful to keep as a possible hacking and
discussion project for New Orleans, depending on the state of the
implementations at that time.

2) There's been agreement in this thread that JS would be preferable
from a user/dev perspective, but it's also clear that Lua is the
closest thing we have so far to a working implementation that can
scale for the particular use case of inline-scripts (which, not to
forget, really is a tough one since we need to have all template code
in a page with hundreds of templates executed with minimal wait time
for the editor on save or preview).

I'd love to see proof-of-concept implementations of inline-scripting
in JS that could scale with acceptable performance/execution
characteristics.

3) Given that it's not entirely clear that Brainfuck<Template
programming or the other way around, it's pretty evident that any
inline scripting solution that meets the real world use cases would be
a huge improvement on current state. That doesn't mean we don't have
an obligation to get things right -- but I'd be thrilled to see
something deployed that gets us 80% of the way there. ;-)

One open question I have, which came up briefly in this discussion:
Are there significant implications of this decision for the
editor/parser work? My understanding is that the visual editor will
never have to execute template code -- that it will only need to be
aware of the template calls and the rendered output as delivered by
the server. Is that correct? Are there cases where the client will
have to execute these inline scripts, either in the context of the
editor, or in some other specific future applications we or others may
want to develop?

I'm fine with a solution that's imperfect for devs, but I want to make
sure we're not accidentally painting ourselves into corners. :)

Thanks,
Erik

Brion:

This is still a bit icky, and won't work at all on some hosts that disable or limit execution/shell-outs. That may be a decision we're willing to make, but it does up the dependency & installation requirements for anybody that actually wants to make use of these templates.

If based on a fully-compatible parser in the client side to do all the rendering, then yes. If communicating with a server-side parser to render out templates, then probably not.

Since we expect to start editor test deployments with client-side JS code this may or may not be something we need to worry about (since I presume it won't be a while yet before we have those JS or Lua templates running in production).

nod resource limits are a hard requirement here. It's trivial to write JS code to use up all your RAM:

On node.js this crashes the process after just 29 iterations, as V8's heap size is currently locked at a relatively small 1 or 1.9GB and it doesn't take long to reach that; if you're more patient, you could easily stay under that limit and still lock up huge amounts of memory for as long as each script runs.

We need to be able to halt the embedded script after some amount of time (wall clock or opcode ticks, whatever) or on some amount of memory usage.

I know the JS systems have a way to halt on time -- browsers will pop up a "Do you want to stop this script?" dialog if something runs too long -- but don't know offhand how easy it is to limit on memory usage, especially if an IPC or network-based server handles multiple scripts from one process context.

If they're one-off spawned processes then you can of course use ulimit as we do for shelling out to convert, latex etc -- if using a networked server then it may be necessary to do some fancy footwork setting up process-wide limits (and/or setting heap limits explicitly, if possible) and respawning processes if/when they fail.

Just as a note -- SpiderMonkey (the C++-based JS engine in Firefox) at least has support for allocating data from different script domains in separate "compartments" which can at least be monitored separately in about:memory in the latest versions. Whether it's possible to have it actually cap those compartments separately I don't know.

I'm actually not exactly sure how to limit the Lua heap size either though; the documentation doesn't seem very clear on that...

I think running untrusted JavaScript on the client side with no memory
limit would be almost as bad as running it on the server side.
The abstraction of memory allocation is the hard part. If they have
that but not memory limits, we could probably add memory limits.
Yes, and return NULL when the limit is exceeded. Here's my custom
allocator:

Lua is able to tolerate having its allocator function return NULL,
unlike most C programs which develop nasty bugs or crash when malloc()
returns NULL. It does a longjmp() (or throws an exception when
compiled under C++) to return control to lua_pcall(), which then
returns an error message to its caller.

longjmp() safety is not entirely trivial. The calling code has to be
aware of the possibility of a longjmp() so that it doesn't leak memory
or corrupt the state in other ways. It's basically the same as
exception safety in C++, except without the compiler support, and
without the bulk of the participating developers being aware of the issue.

Indeed. :) One would want to do some sandboxing there as well... JS-on-JS sandboxing could be done with an intermediary layer like caja[1] or by separating into an isolated iframe context; in any case that's not a bridge that needs immediate crossing.