.NET 4.0 – System.Shell.CommandLine Parsing – Part 1

Update (2/22/2009, 5:29 PM): Thanks to all the readers for the feedback on this post so far. Over the weekend I had some mail conversation with the BCL team, and it turns out the bits for this feature mistakenly made it in the CTP. Allow me to quote Justin's mail:

Hi Bart,

I saw your blog post on System.Shell.CommandLine. We’re actually not planning to include this in .NET 4; it was mistakenly public in the CTP. The design wasn’t something we were happy with so it has been removed and will not be available in the next preview release. However, we are planning to release a much better designed command line parsing library on CodePlex later this year. When it is available, we’ll be sure to announce it on the BCL team blog. If you could let your readers know, I’d appreciate it.

Thanks,Justin

So, stay tuned for more information on this upcoming library. As you can see, some of the concerns raised through the feedback are well-known to the team, hence their investment in a better library design, implementation, test, documentation and shipping effort. I'm sorry for the inconvenience this post may have caused you. My next parts of this series will be archived for history till the better one arrives :-).

Introduction

Command-line parsing isn’t a trivial thing and the wheel has been reinvented many times to make this job easier. Starting with .NET 4.0 though, developers will have built-in support for command-line parsing in the framework. In this series of posts I’ll dive into the details of this hidden treasure in .NET Framework 4.0. You can actually start playing with it today, by downloading the Visual Studio 2010 and .NET Framework 4.0 CTP.

So, what’s the deal? Why focus on something seemingly old-fashioned like command-line parsing? Turns out quite some programs want to support command-line arguments of some sort, no matter whether the tool is intrinsically command-line driven (e.g. console applications such as compilers) or comes with a GUI: they all have a Main method. And that’s where the pain starts. All you get from Main’s arguments is an array of strings, and you’re on your own from there on. The very first thing you’re likely going to do is some kind of parsing to turn the arguments into rich objects. Far from trivial if you even think about a little bit of flexibility at the command-line:

is the argument required or optional?

positional (like copy <first> <last>) or named (like csc /t:library)?

what’s the type of the argument?

how to validate an argument’s value?

support for “parameter set”?

etc

Lots of things to think about. And we’re lucky entire teams have been doing that for quite a while: Windows PowerShell. As you’ll see, core concepts of command-line parsing in the .NET Framework are based on the techniques applied to PowerShell cmdlets. Not only does that make command-line experiences consistent, it also allows for developers to transfer their knowledge from one domain to the other, and even to port command-line parsing from one side to the other.

When to use what? Windows PowerShell is definitely the way to go for automation, so if you find yourself writing command-line tools that are applicable in such a scenario, PowerShell should be a no-brainer. In addition, the use of PowerShell gives you a lot of infrastructure to party on with regards to pipelining, types, error handling, etc. But if you find yourself in a scenario where it just feels right to write a .NET console application or to add command-line support to any kind of application, System.Shell.CommandLine should be your next big friend…

System.Shell.CommandLine

The new namespace for the command-line parsing functionality is System.Shell.CommandLine and lives in the System.Core.dll assembly, so it will be included by default in new projects. It contains quite a few types:

The main entry-points to the API are CommandLineParser and AttributeCommandLineParser. What’s the difference between these two? The first one, CommandLineParser, is the simplest one to use. It’s very basic and imperative in use: you’ll call a few methods to add parameters to the parser, invoke the parser and get the detected values (or an exception if the command-line was invalid) back through a dictionary-alike lookup. The second one, AttributeCommandLineParser, is the one that’s declarative in nature. You declare a class with properties that get annotated with metadata indicating the corresponding parameter’s behavior. Next, you pass a new instance of that type in to the parser, and given the command-line it will populate the properties with the values found. All of this can be done in conjunction with attribute-driven validation and even transformations on parameters (like turning short file names into full paths or so).

To get started with the basics, I’ll dive into the CommandLineParser class today. In the next episodes we’ll look at the AttributeCommandLineParser in all its glory.

As you can see, this class isn’t that big. I should stress that CommandLineParser is the least powerful of the two, but still applicable in a lot of cases where you don’t require automatic mapping onto an object, data validation and/or transformation, etc.

What’s better than looking at a simple example. Say we want to rewrite chkdsk with support for a few of its arguments:

Let’s tweak it a little though to reduce the number of parameters (so we can get to the essence) and make one required:

CHKDSK volume [-F] [-L:size]

This gives us a chance to show a required parameter (volume), an optional “flag” parameter (/F) and one that takes a value (/L). The basic steps to parse this are:

Create a CommandLineParser instance.

Add parameters to it using the AddParameter methods.

Call Parse, feeding in a string or without arguments to parse the process’s command-line.

A quick journey through the code. First, we new up the CommandLineParser object. Nothing special here. Next, three parameters are added. I’m using the most specific overloads to specify everything up to a parameter description (ignoring localization lazy as I am…). The supported types for arguments are string, boolean, int32 and double. Names are not case sensitive for the user, but do matter in the code. I’m referring to the fix parameter as “F” here, so later on I’ll have to use “F” in capital again. Whether or not a parameter is required should be self-explanatory. The name requirement might be less obvious but essentially this boils down to either allowing positional use of the parameter or requiring the parameter to be paired with a name all the time. For the “flag” -F it makes sense that’s going to be required. For –L this means this value can only be specified like “-L:size”, requiring the “L” to be spelled out explicitly. The last argument of AddParameter takes the help string.

Once we have declared (in an imperative way, that’s the way CommandLineParser works, see next post for a more declarative metadata-driven way) the parameters, we can invoke the parser. Just calling Parse without an argument will use Environment.CommandLine as the input. Alternatively we could feed in a string ourselves. When the user violated the contract (missing out a required parameter, specifying an invalid type for a valued parameter, etc), an exception of type ParameterParsingException is thrown. You can take a look at the Message property for detailed error info (as expected), but I’m also printing out the auto-generated syntax report by using GetHelp. If the user made a mistake, this causes something like this to appear:

Other types of errors will be handled in a later episode (like duplication of parameters, validation errors, and such). (Note: I’ve called my assembly mchkdsk for “managed chkdsk”, but feel free to find other explanations for the “m” prefix given the dysfunctional nature of the thing…)

Finally, we retrieve the parameters using type-specific Get*ParameterValue methods. As Boolean parameters represent a flag, there’s not retrieved as nullable (the absence of the parameter means false, presence means true), but all other parameter types are nullable (string obviously always is). And once we have obtained the parameter values, we get into the program’s logic which is just some dummy code as you can imagine. Below I’m executing mchkdsk C: -F –L:1024 as an example output:

Some quick notes:

Named parameters are prefixed with a dash ‘-‘.

String parameters with spaces can be surrounded by double quotes; ‘\’ acts as the escape for quotes in between quotes.

Named parameters that take a value (i.e. non-Boolean) can have an optional colon ‘:’ between the name and the value, as well as spaces (in BNF, something along the lines of <name>‘:’<space>*<value> or <name><space>+<value>).

Remaining arguments are supported on an opt-in basis (see AllowRemainingArguments). If not opted-in, an exception will be thrown if arguments other than recognized ones are found. Otherwise, you can find them in the GetRemainingArguments array result.

That was easy, no? Next time, the more die-hard way :-). Enjoy the weekend!

#Miguel de Icaza: System.Shell.CommandLine does not belong in System.Core

Today I was alarmed by a new API being introduced into .NET 4.0, the System.Shell.CommandLine which is being dumped into System.Core. An introductory blog post shows a bloated, over-engineered, too rich in the OO, too poor in the taste look at th