nREPL.Next

FYI, this is an old design document that preceded the redesign manifested in nREPL versions 0.2.0 and later.

nREPL has been around for some bit over a year now, with moderate success in attaining its objective to provide a tool-agnostic networked Clojure REPL implementation. Current events are such that now may be a good time to apply some of the lessons learned over that period to maximize nREPL’s applicability and reach.

Warning

Apologies

This process will result in breakage. For this, I apologize, especially to those that have implemented nREPL clients for other platforms/languages:

Martin, Meikel, and anyone else out there that has taken it upon themselves to write an nREPL client (get in touch if you have!): I hope you will come to support the changes in nREPL.Next, and I welcome your input in particular as design discussions move along. If it is any consolation, many of the proposed changes are intended to maximize nREPL’s interoperability even further, including making it even easier to implement non-Clojure nREPL clients.

Background

Before jumping in, please take the time to:

read and understand the current nREPL implementation as documented in its README,

check out this thread from the main Clojure ML where a variety of topics were addressed around the current wire protocol

Problems

Let’s be clear about what doesn’t work about nREPL before making changes to "fix" things.

Its protocol is structurally unsuitable for some tasks, and may be a barrier to client implementation

It has become clear that nREPL’s underlying protocol is no longer suitable.

Recent advancements (in particular, enabling "richer interactions") have exposed some practical shortcomings. Most painfully, the use of strings as the only value type in passed messages has led to unreasonable complexity and overhead when returning (and perhaps, in the future, sending) binary data, which necessitates applying base64-coding on both sides of a connection above and beyond the wire protocol itself.

A solution to this problem should allow for binary data to eventually be effectively streamed to (and from?) the nREPL client.

In addition, the original design process for nREPL largely punted on rigorously obtaining input on protocol design and soliciting input from those that had protocol-sensitive use cases and/or limited potential client implementation languages/environments, now clearly a very serious mistake. Specific concerns in this department include:

the current nREPL protocol is unnecessarily line-based, making implementations challenging in contexts wherein processing lines of text is not a sort of well-supported abstraction

local APIs sometimes make working with fixed-length or defined-length messages easier and/or more efficient

Sessions are tied to connections

"Session" data (i.e. environment that is bound to key vars for the duration of client evaluations) is currently associated with a socket connection to the REPL server. This:

presents problems for various clients that cannot reasonably maintain an open connection (e.g. vimclojure, perhaps others)

makes it impossible (or unreasonably difficult, at least) to "clone" or "migrate" a known session to another connection (as with portal's fork command)

represents a likely (guaranteed?) point of complexity for building an nREPL endpoint for services that do not have socket-connection semantics (e.g. HTTP, STOMP, JMX)

Implementation is hopelessly tied up with sockets

There’s no reason why the core nREPL implementation and its evaluation semantics couldn’t be used with multiple transports. Sockets are likely the base case (certainly from a dependency standpoint), but that’s just a start. The current implementation provides no point of abstraction for implementing or using "alternative" transports. These might reasonably include:

HTTP

JMX

STOMP and other mq-related formats/protocols

Emacs/SLIME/swank

The majority of Clojure programmers use Emacs with SLIME and swank, and they are currently locked out of any environment that uses nREPL just as anyone that uses nREPL-based tooling (Eclipse + Counterclockwise, vimclojure, jark…) is locked out of any environment that uses swank. If nREPL is going to fulfill its objective of providing a common protocol for REPL interoperability across tooling environments, we must find some way to bridge these two worlds.

(The practical upshot is significant: perhaps someday we can dispose of the parochialism of e.g. lein-swank, lein-nrepl, and the command-line-only lein repl. What if all of our tools (including a command-line client) could talk to a service started by a unified lein repl command and invocation?)

I’m open to all options for bridging this divide, and look forward to hearing from the swank and SLIME/elisp wizards among us. That said, it is understood that it is possible that technical or nontechnical factors surrounding the SLIME/swank codebase and development process may prove to be insurmountable at this time.

Unknowns

ClojureScript

Where does ClojureScript’s browser-repl fit in? The execution model of browser-repl is fundamentally different than anything in Clojure-land AFAICT (using polling of the cljs repl to get forms to execute browser-side, if I’m understanding things properly), so perhaps ne’er the twain shall meet.

Versioning

The question of versioning was punted when originally designing nREPL and its protocol. Now that we’re considering a revision cycle, and existing clients will likely just break hard with nREPL.Next and with no sensible indication why, providing a sane versioning mechanism is a priority (if only to allow clients to provide some kind of useful guidance to users). Pointers to known-good approaches and recommended practices are most welcome.

out / err

Given Clojure 1.3+ and its binding conveyance, content sent to out and err are properly returned to the nREPL client even after a sent expression is evaluated, as long as that content is being sent from an agent or future (as opposed to a bare JVM Thread, which, without intervention, will just dump data to System/out and System/err).

Is this sufficient? Other REPL implementations (Cake) take pains to multiplex System/out and System/err so that clients are delivered content sent to those writers, regardless of its source. Even given binding conveyance, being able to receive that content is useful, especially in cases where the REPL is deployed remotely and without a way to subscribe to these streams, one must log in another way to view e.g. log files.

Here is a link to how Cake supported multi-outstreams for reference. A proxied BufferedOutputStream is created around the outs and errs vars so that rebinding these will cause System/out and System/err to go to a different location for the current thread. This code could be pulled up into Clojure itself to provide multiplexing for all Clojure programs.

in

nREPL provides very limited means for getting stdin data to a context (messages may provide that data in an :in slot in a request, but there is no way for the remote side to request it). Swank provides a model where any attempt to read from in prompts the connected tool to supply data for stdin in a later message. nREPL can certainly duplicate this, but is it sufficient? It may not be a general solution (e.g. can’t read just a single character off of stdin, only lines at a time?).

Portal provides a command for sending stdin separately from the eval command, but it does suffer from the problem of not knowing when the remote code is trying to read from in. A mechanism like swank uses would solve this problem.

Unknown unknowns

Take this opportunity to address any further lingering issues that prevent nREPL from being the canonical network REPL implementation for Clojure tooling and applications.

Strawman proposal

The below is an initial proposal that is the result of discussions with users, tool builders, nREPL client implementors, and implementors of other Clojure network REPLs, but it is provided fundamentally to motivate discussion, enhancements, and/or full counter-proposals. Tear it apart.

Retain the "good parts" of nREPL as it sits today

nREPL 0.0.x got a lot right:

asynchronous evaluation model

message-based protocol

generally easy to implement clients for

simple model: every message just evaluates code. No privileged "commands"; "meta" operations (impacting the REPL server or session itself) are performed by evaluating code that touches well-defined REPL server APIs.

jlb: I agree with the spirit of this point, but only having a single command seems overly limiting to me. For example, how can we send stdin asynchronously with only one command? Or fork and close sessions? The number of commands should be as small as possible, but I don’t think there’s a good reason to keep it to just one. Portal, as an example, has four commands: eval, stdin, fork and close.

generally assumes nothing about its usage and context

e.g. can be used interactively as well as for supporting tooling

functionally dependency-free for the base case of socket-based transport

etc.

More of that, please. The remainder of this strawman is essentially a diff with the existing nREPL "spec" and implementation as a baseline.

New wire protocol

The current nREPL protocol is fundamentally textual, requires escaping of strings, and costly encoding of binary data. An example request message:

2
"id"
"foo"
"code"
"(println 5)"

…which corresponds to this Clojure map:

{:id "foo" :code "(println 5)"}

There are three wire protocol options on the table.

Netstrings

Using netstrings (originally suggested by James Reeves) but retaining the fundamental structure of nREPL messages would lead to this transliteration (linebreaks added for clarity, and should not be taken to add to the byte count as indicated by the netstring header integers):

43:
2:id,
3:foo,
4:code,
12:(println 5),

Put more formally, each message would be expressed as one netstring that consists of 2n netstrings, where n is the number of key/value pairs that are to be found in that message. Each key and value is provided as a separate netstring. The "outer" netstring provides the cumulative size of the message, allowing one to allocate buffers as necessary in environments where that is helpful.

Note

It has been suggested that message-size prefixes be padded to a fixed length (e.g. 0000043 instead of 43). This is in conflict with the specification of netstrings (and bencode, discussed later). Is there any value in adopting such a fixed-length prefix?

Beyond this, the existing semantics specified by nREPL (see the protocol discussion in the README) should be retained. In particular, a recent addition there allows for the description of sequences based on values of repeated keys in messages, so this:

Would correspond to this Clojure map containing a vector for the :accepts entry:

{:id "foo"
:code "(println 5)"
:accepts ["png", "jpeg"]}

The flexibility provided by a message-based protocol that allows for an open set of slots has proven very useful (e.g. when implementing the "rich interactions" previously mentioned). In general, striking a reasonable balance between representing Clojure-idiomatic data structures and a minimum of encoding overhead/gymnastics is desirable.

All message values must be UTF8-encoded byte ranges by default. Unencoded binary values must be indicated by including their keys in an unencoded slot that precedes them in the message so that each message’s content is self-describing outside of any particular environment, e.g. (again, linebreaks added only for clarity):

bencode

The netstring-based protocol described above, especially given the semantics of sequences, suggests that using Bencode (of which netstrings are essentially a part) would be far preferable insofar as we would be able to express arbitrary compositions of maps and sequences/vectors. Necessary additions to Bencode would be:

implication of UTF-8 (we do not want to get into variable character encodings, just not worth it AFAICT),

…therefore, a continued requirement for an unencoded slot to specify values that should not be decoded as UTF-8

the addition of the prefixed cumulative message length (making the entire message a netstring) as discussed earlier so as to benefit those that need to allocate read buffers efficiently.

Note

Given bencode, is the last of these items necessary? i.e. does knowing the cumulative size of the entire message provide any benefit to potential authors of clients where buffer allocation is a concern?

Here is the :accepts example from above in bencode format (with newlines added for readability)

d
2:id
3:foo
4:code
12:(println 5)
8:accepts
l3:png4:jpege
e

bencode is particularly attractive because it appears to be very widespread, at least based on the existence of more than a dozen implementations for various languages as noted here.

BSON - Binary JSON

The BSON spec is particularly attractive because it appears to be very widespread, at least based on the existence of more than a dozen implementations for various languages as noted on the protocol site

Warning

Given the limits of e.g. HTTP, do we need to limit request and response values to scalars and lists/vectors of scalars?

Tagged netstrings

Tagged netstrings are interesting as well, especially insofar as they would allow us to include decoding / "media type" information for values going to and fro. This information would be conveyed via a type tag (again, with a netstring to allow more than one char). The type tag could correspond to the accepts tags. This would eliminate the :unencoded field, because the type tag would basically define that. Similar to clojure "jpeg" would be a global type tag, "my-prefix/datatype" would be a qualified one for custom data.

Identify and retain/manage "sessions" across connections

Currently, when a client disconnects from an nREPL server, its "session" (essentially, all dynamic scope associated with their REPL) disappears. Lifting that state up into an atom that client requests can use again across connections (and therefore on top of transports where connections are generally not persistent) is desirable. (portal.server implemented something similar using agents to serialize evaluations in addition to retaining sessions across connections.)

This will require some implementation details re: holding those agents, identifying them using e.g. an opaque string ID, and allowing for clients to:

1) Specify their "session" ID for any particular message (which defines the session/dynamic scope within which an evaluation will occur).

2) Additionally specify a message ID (as nREPL requires now), so that responses can be paired back up with prior requests.

3) (Optionally) "clone" an existing session, which would create a copy of the named existing session’s data for a new session.

After each evaluation, the final value of all dynamic vars would be `set!`ed back into the originating session atom.

Serializing evaluations / session usage

Some clients and/or some use cases may prefer to have guarantees that each evaluation performed against a given session will be done in order, blocking so as to ensure exclusive usage of the session. This implies a third identifier (aside from message ID and session ID) that specifies an "evaluation queue" that would effectively serialize handling of incoming messages.

To illustrate the effect, consider these two requests (shown as Clojure maps, wire protocol is irrelevant here):

Without a notion of evaluation queues (or, if they are provided but none are specified in the requests), the value of print-length after both messages are processed can be either5 or 10. Evaluation order is undefined.

Then message 1 will always be evaluated completely before 2, and print-length is guaranteed to be 10 (leaving aside any error conditions or other unexpected circumstances).

Warning

The error conditions associated with agents are significantly more complicated than nREPL’s current 1:1 relationship between connection and session, and deserve some consideration. In particular:

How do we handle aging and disposal of environments?

What does it look like to a client when they attempt to evaluate with an already-disposed environment?

Insofar as the aim here is to enable e.g. nREPL over HTTP/JMX/STOMP, how does session ID (queue ID?) and its semantics mesh with the various session identifiers and semantics present in each of those protocols and the systems they are usually hooked up to?

Looking at how e.g. web servers manage in-memory sessions would be instructive (timeouts, max-sessions, local policies), as many of the issues are fundamentally the same.