Command Line Parsing in SBT

I’ve been working with Scala for the last couple of years and like many (if not most) Scala developers, I’ve had my ups and downs with SBT. SBT is a powerful tool but it’s easy to lose your footing.

During our most recent hackweek at Nitro I worked on an SBT plugin with some colleagues to help us create and manage our build pipelines in Jenkins. This was the first time I needed to handle proper command line input in SBT.

In this post, we’ll look at two ways of handling command line input in SBT:

I would definitely recommend the second approach if you are looking to do this yourself as I believe it is simpler, more pragmatic and more robust to change.

mkdir

To make the discussion more concrete, we’ll use an example that should be familiar to all developers. mkdir is among the first unix commands that you come across when you start using the command line. It has a small but non-trivial set of command line arguments, making it a good example to drive our discussion.

sbt mkdir

InputTask and Command are quite similar and could both be used for sbt mkdir, no matter what approach we take to parsing user input. The prevailing advice is to use InputTask as a first choice as Command is lower level, allowing you to actually modify the state of the build1.

Here’s a simplified definition of InputTask2 and the signature of one of the constructors for Command3:

The point of similarity between these two is the argument parser: State => Parser[_]. The Parser[_] returned by this function is where we get access to the user’s command line input.

Approach 1: The SBT way

Parser[T] is the tool that SBT provides to, as you might have guessed, parse command line input4. Parser[T] defines the low-level methods needed for directly implementing new parsers. Generally, you won’t need to do this. Instead, SBT provides a number of parsers for common scenarios in sbt.complete.Parsers5. You will generally combine these together to build more useful Parser[T] instances using the combinators in sbt.complete.RichParser which implicitly enriches Parser[T].

If you decide that you need to use these parsers in your SBT project, you will need to familiarise yourself with both sbt.complete.Parser and more importantly sbt.complete.Parsers.

Step 1: Add an InputTask or Command to your build definition

Here is what an InputTask and a Command would look like for sbt mkdir in our build.sbt file.

importsbt.complete.Parser// defines the basic Parser[T] infrastructure
importsbt.complete.DefaultParsers._// provides built-in parsers for common types
importMkdir.{run,MkdirCommand}// Let's assume for now that this exists!
lazyvalmkdirParser:Parser[MkdirCommand]=...

Here’s sbt mkdir as an InputTask:

lazyvalmkdir=inputKey[Unit]("make directories with the help of sbt parsers")mkdir:={// The ".parsed" macro is available in InputTasks and
// it applies the given parser the user's command line input.
valmkdirCommand:MkdirCommand=mkdirParser.parsed// implementation of mkdir
Mkdir.run(mkdirCommand)}

And here it is as a Command:

defmkdirCmd=Command("mkdir")(_=>mkdirParser){(state,mkdirCmd:MkdirCommand)=>Mkdir.run(mkdirCommand)state}// Add the Command to the list of commands in the project settings
commands++=Seq(mkdirCmd)

So far these are both quite straightforward and you’re free to choose which suits your preference and situation better. Whichever way you go, you will need a Parser[MkdirCommand] to handle the input.

Step 2: Define MkdirCommand to collect the input

This is the simple data structure used to collect the information we want from sbt mkdir’s command line arguments.

Step 3: Grammar Time

When working with parsers, it’s really helpful to define a grammar for the input you want to parse. This might sound
daunting but it’s really not so bad. Even if it’s not completely formal, it will prove invaluable while we are building up our Parser[T] from the ground up.

Here’s a possible BNF-like grammar6 for mkdir. This borrows from man chmod which helpfully provides a grammar for the “symbolic mode” used in mkdir.

Step 4: Construct Parser[MkdirCommand]

Now that we have our grammar in place, writing the parsers is (theoretically) straightforward.

The approach will be to create parsers for the “terminals” in the grammar (those values that are only found on the right-hand side of the rules) using SBT’s built-in parsers.

Then we will successively build larger and larger parsers by combining smaller ones until we get a Parser[MkdirCommand] that handles the entire sbt mkdir input defined by our grammar.

Let’s look at some examples.

Parsing directories

Without a parser for handling directory names, mkdir wouldn’t be much use so let’s start here. This is the part of the grammar we’re interested in:

directories::=[directory_name...]directory_name

We need a Parser[Seq[File]] that extracts the list of directories to create from the user input. Let’s start from the bottom and work our way up.

First, we define val directory: Parser[File] that uses the built-in StringBasic parser to match a String in the user input. We then map over this Parser[String] to transform it into a Parser[File] - the type use to represent directories in MkdirCommand. This parser is equivalent to the directory_name terminal in our grammar.

Next, mkdir accepts a sequence of directories so we need to define val directories: Parser[Seq[File]] that combines our directory parser with the built in Space parser and then repeats this combined parser one or more times to transform a space-separated list of strings into a Seq[File].

Space - this a built-in parser that matches one or more whitespace characters

~> - this is used to combine parsers in sequence, returning the value to the right of ~>, in this case the File representing the directory.

+ - match the parser one or more times

Parsing modes

Next, let’s build a parser for another part of our grammar - the mode option -m mode. The value of mode can be one of two types, “symbolic mode” or an “absolute mode”, both defined in chmod’s man pages.

First, we define some simple data structures to help us represent these modes and their components in our code.

First, let’s tackle “absolute modes”. The approach is the same as before - we start with a terminal parser val octalDigit: Parser[Char] and build larger parsers from it using parser combinators. One thing to point out in this example is the use of def chars(legal: String): Parser[Char]. This comes from sbt.complete.Parser and parses a single Char if it is found in the provided string of legal characters.

Next up is the “symbolic mode” parser. The structure of this is more complicated but thankfully we have our grammar to guide us. Again, the process is the same - start at the terminals and work your way up.

~ - similar to ~> this allows us to match parsers in sequence but in this case, it returns the successful result from both sides of ~. In the example above (op ~ permissions) has a type Parser[(Op, Seq[Permission])]

<~ - matches two parsers in sequence like ~ and ~> but this time, it returns the value on the left

? - optionally match the parser

Finally, we need to combine val symbolicMode: Parser[SymbolicMode] and val absoluteMode: Parser[AbsoluteMode] into a Parser[Mode] that looks for the -m option
and then matches either an absolute mode or a symbolic mode:

Parsing the entire command

We’re not done yet. To complete our Parser[MkdirCommand] we still need to define a parser for the remaining boolean flags (-p and -v) and then combine this with the parsers defined above into our mkdir parser Parser[MkdirCommand].

The approach is the same as we have seen in the last two examples so it won’t add much to show this here. Hopefully this has given you a good feeling for parsing user input the “SBT way”.

If you are interested in seeing this explicitly or trying out the full Parser[MkdirCommand], the full source code for these examples is here.

Thoughts on the SBT way

SBT’s parser infrastructure is powerful and working with grammars and parsers can be fun. However, I would also like to make clear that it is not without its problems. Here are some caveats that I would like add.

Constructing parsers like this quite work-intensive - you may need to write a lot of Parser[T]s to get what you want done.

It can be difficult to express exactly what you want in terms of Parser[T]s. Writing parsers like this is quite low-level.

You cannot just rely on the documentation - you will need to read the source to get things done.

It has a complex symbolic syntax which is not to everybody’s taste.

There are some rough edges to SBT’s parsers. For example, you can use multiple Parser[T] instances in an InputTask but they are evaluated in reverse order. This is considered a bug.

It is easy to break your Parser[T] with a small change in any of the component Parser[T]s. This fragility raises questions about the maintainability of your SBT project. If you have to handle a large number of command line arguments, you may find this approach frustrating.

Approach 2: scopt

Now I’d like to point out an alternative and more pragmatic approach to parsing command line arguments in SBT. This approach is simpler, more accessible and likely more familiar to developers. This is the approach that I would take next time this problem comes up.

The basic idea is to use the spaceDelimited: Parser[Seq[String]] parser to split the entire input string into space-separated strings. Then you can use the simple and widely-used command line argument parsing library scopt to handle the parsing for you.

This removes the need to maintain a complex hierarchy of SBT parsers, improving maintainability and simplifying your code. Instead, you have a single flat scopt.OptionParser that defines your parameters, how to handle them and looks after the low-level parsing details for you.

Let’s see what sbt mkdir might look like with this approach.

Step 1: Define MkdirConfig

Firstly we define a data structure with sensible defaults to collect the information from the input. This is much the same as with the previous approach.

You’ll notice that we’re not completely doing away with SBT’s built in parsers in this approach. We’re just limiting their use to the simple spaceDelimited parser that splits the user input on whitespace. Once we’ve done this, scopt looks after all of the details for us.

Thoughts on the scopt way

I much prefer this second approach to handling command line arguments in SBT for a number of reasons:

It’s simple and pragmatic - you don’t want your parsing code to take up lots of space.

No need to make low-level parsing decisions.

You don’t need to “look under the hood”.

The OptionParser is more robust to modification as the individual parsing components are independent. This helps with maintainability.