I’m working on a localization system called Fluent. Part of Fluent is our localization DSL called FTL.

When working on FTL parser, so far I used an approach that defines AST with String properties and copied data from the source string into the AST.

One of the ideas we have is to attempt to design the parser to be copy-free and just use &str of the original string in AST.

The issue I encountered while trying to work on it is that we have cases where in the copy-full parser approach we alter the data as we read it. Two examples are comments and escape characters in strings.

Example:

# This is
# a multiline comment
# in FTL
key = Value { $placeable } but this is a regular character \{

In the current parser, the result AST will look more or less like this:

Why do you want a copy-free parser for fluent though? I would expect that a carefully designed AST could be more compact and efficient to process, and looks like you don’t need IDE capabilities (which are the main motivation behind libsyntax2 design).

It seems to me that what you actually want is efficient data strucutre: avoiding copies can sometimes lead to efficiency, but not always. For fluent, I would thing something like this makes sense?

struct FluentFile {
// A single string holding a concatenation of all literals, which gives a single allocation
// and interning.
strings: String,
// An AST represented as a struct of arrays, with `u32` indices.
messages: Vec<Message>,
values: Vec<Values>,
text_elements: Vec<TextElement>,
}