#perl #programming

Menu

Tag Archives: getopt

Post navigation

About this mini-article series. For each of the past 24 23 days, I have reviewed a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

This series was born out of my experimentations with option parsing and tab completion, and more broadly of my interest in doing CLI with Perl. Aside from writing this series, I've also released numerous modules related to option parsing, some of them are purely experimental in nature and some already used in production.

It has been interesting evaluating the various modules: the sometimes unconventional or seemingly odd approach that they take, or the specific features that they offer. Not all of them are worth using, but at least they provide perspectives and some lessons for us to learn.

Of course, not all modules got reviewed. There are simply far more than 24 modules (lcpan tells me that there are 180 packages in the Getopt:: namespace alone, with 94 distributions having the name Getopt-*). I tried to cover at least the must-know ones, core ones, and the popular ones. Other than that, frankly the selection is pretty much random. I picked what's interesting to me or what I can make some points about, whether they are negative or positive points.

I have skipped many modules that are just yet another Getopt::Long wrapper which adds per-option usage or some other features found in Getopt::Long::Descriptive (GLD). Not that they are worse than GLD, for some reason or another they just didn't get adopted widely or at all. A couple examples of these: Getopt::Helpful, Getopt::Fancy.

Modules which use Moose, except MooseX::Getopt, automatically get skipped by me because their applicability is severely limited by the high number of dependencies and high startup overhead (200-500ms or even more on slower computers). These include: Getopt::Flex, Getopt::Alt, Getopt::Chain.

Some others are simply too weird or high in "WTF number", but I won't name names here.

Except for App::Cmd and App::Spec, I haven't really touched CLI frameworks in general. There are no shortages of CLI frameworks on CPAN too, perhaps for another series?

In general, I'd say that you should probably try to stick with Getopt::Long first. As far as option parsing is concerned, it's packed with features already, and it has the advantage of being a core module. But as soon as you want: automatic autohelp/automessage generation, subcommand, tab completion then you should begin looking elsewhere.

Unfortunately except for evaluating Perl ports of some option parsing libraries (like Smart::Options, Getopt::ArgParse, Getopt::Kingpin), I haven't got the chance to deeply look into how option parsing is done in other languages. Among the other languages is Perl's own sister Perl 6, which offers built-in command-line option parsing. This endeavor of researching option parsing in other languages could potentially offer more lessons and perspectives.

I hope this series is of use to some people. Merry christmas and happy holidays to everybody.

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Getopt::Complete (GC) is a module written by Scott Smith (SAKOHT) in 2009 and also co-maintained by Nathan Nutter (NNUTTER). Last release is in 2011. So far it registers one CPAN distribution depending on it, although it's written by Scott himself.

Shell tab completion is a topic which I have been interested in since around 2012. I've released numerous modules related to completion, including two option parsing modules Getopt::Long::Complete (GLC) and Getopt::Long::More (GLM) which sports completion as (one of) its selling point, so it's natural that I want to compare them to Getopt::Complete. Throughout the article I'll be repeatedly doing those comparisons, and I hope it's not becoming too annoying.

Interface

GC, like GLC and GLM, is a Getopt::Long (GL) wrapper that adds tab completion feature. To let the module detect tab completion mode and return completion answer as soon as possible, GC offers this interface:

That is, it accepts the options specification as import arguments. This looks simple but presents its own inconveniences.

The second thing you'll notice that the options specification are different than GL. While GLC and GLM choose to use an interface that is backward-compatible with GL, GC focuses on tab completion. The values of the pairs in the options specification is not a variable reference/coderef as you would expect in GL, but solely completion specification: it's either undef (meaning the option does not require argument), a string (meaning a completion type/routine to use, e.g. files to complete from filenames, commands to complete from program names in PATH, and so on. The options values themselves are collected in %ARGS.

Thus, compared to GLC and GLM, specifying completion routines is simpler in GC (but I also wrote Shell::Completer to provide the same level of convenience with more flexibility).

Activating Completion

To activate completion in bash, you need to declare this shell function first:

This is different than the way you activate completion for GLC- or GLM-based scripts:

% complete -C myapp myapp

External programs receive raw COMP_LINE and COMP_POINT environment variables from bash when doing tab completion, while shell functions are provided with the already-parsed command-line COMP_WORDS array variable and COMP_CWORD. GC wants to avoid parsing the command-line on its own, so the _getopt_complete function is used to give the Perl program parsed command-line arguments in @ARGV, and COMP_CWORD in another environment variable.

Using command-line that is already parsed by bash in COMP_WORDS has its pros as well as cons, due to the way that bash parses command-line for COMP_WORDS. So I cannot say which way is better, but what I can say is parsing COMP_LINE ourselves is more flexible.

Completion behavior and bugs

When you press tab after the command:

% myapp <tab>

GC offers only completion from the <> specification. In the above example, it only offers list of directories as answer. On the other hand, GLC and GLM also shows the list of available option names. With GC, to list the available options, you have to do:

% myapp -<tab>

I also cannot say that GLC's and GLM's way is better, but it certainly makes the CLI program more discoverable. By just pressing Tab, a user (especially a new user) can know more about what's possible.

GC has still a few problems. First of all, it cannot complete "–opt=" when COMP_WORDBREAKS contains "=". I have put workarounds for this issue in GLC and GLM. Second, it cannot handle filenames/directory names with spaces, or quotes, and probably other special characters too.

Third, GLC and GLM through Complete::Util offers some matching algorithms aside from simple prefix matching, for extra convenience. This is not offered by GC.

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Getopt::Kingpin is a port of Go's kingpin library, written by Masaaki Takasago (TAKASAGO) in 2016. It offers the usual "nowadays standard" features like: short and long options with short option bundling, automatic help/usage message generation, specifying that an option is required, default value, and subcommands. Two extra features are: specifying that an option can be set via environment variable of a certain name, and built-in completion (which is a feature from the original library but doesn't seem to be implemented yet in the Perl port). The Go library also allows templating of help message, and this is not yet supported by Getopt::Kingpin.

Like Smart::Options (reviewed a couple of days ago), kingpin is using the so-called "fluent style" interface, a.k.a. chained methods, which I find annoying to type in Perl due to the method call operator in Perl being -> instead of a single dot. Although fortunately the chained methods interface is slightly less annoying than in Smart::Options.

After looking at the 3 ports of option parsing libraries (the abovementioned two plus Getopt::ArgParse reviewed yesterday) it indeed seems that subcommand support is becoming a standard thing. Which makes me think about whether Getopt::Long should also add such feature, or whether we should promote some other option parsing library as the "best practice" when one wants to do subcommands. So far, I'm not seeing any single best candidate for "Getopt::Long + subcommand support".

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

In contrast to in Perl, where the core modules Getopt::Std and Getopt::Long stand the test of time and remain the most popular ways people parse command-line options with in their Perl CLI scripts, in Python we encounter several churns of recommended standard modules.

First there is getopt, "C-style parser for command line options". To use getopt, you pass a string containing list of short options a la Getopt::Std, e.g. "ho:v" (meaning -o takes argument while h and v are flag switches), and also an array containing long options, e.g. ["help", "output="] (meaning –output takes argument while –help does not). But, instead of supplying references to variables to set, or coderefs (remember, specifying anonymous function is inconvenient in Python) like in Getopt::Long, in getopt programmers are asked to do a manual if-then-else and a loop (see the linked documentation for example). This is also quite similar in interface to the GetoptLong class in Ruby.

No doubt, this style of programming feels manual and tedious. Thus came optparse which is more OO and supposedly more Pythonic. Instead of passing a whole list of options at once, you now add one option (object) at a time using add_option method, along with more information for each option: usage/help message, type, whether the option is required, number of arguments expected, default value, and perhaps some callback. optparse's capability is equivalent to Getopt::Long or Getopt::Long::Descriptive, except that optparse makes some design choices, for example it is decidedly Unix-oriented, allowing only – or — as the option prefix (while Getopt::Long allows you to configure this). The documentation is quite probably the nicest aspect of this module: it does not assume much knowledge (like familiarity with Unix or CLI) from the readers and explains at length what an option is and how should one design a CLI program with regards to accepting options. I realized that "required option" is indeed an oxymoron from reading it!

But, as with Getopt::Long, optparse does not have the concept of subcommands. Thus arrived argparse. It is basically like optparse in appearance, except it has some extra features like the ability to specify positional arguments (in Getopt::Long, this is handled by the <> option specification) and support nested subcommands with the use of subparsers. Interestingly, argparse supports reading arguments from a file just like Getopt::ArgvFile, and this is the only form of "config file" it supports.

As things are right now, argparse becomes part of the standard library (a.k.a. core modules, in Perl parlance) while optparse is now deprecated and might be removed. However, getopt remains.

There is a Perl port of argparse on CPAN called Getopt::ArgParse, created by M ytraM (MYTRAM) in 2013 and last updated in 2015. It is not feature-by-feature equivalent to its Python original, because of language differences and because argparse still accumulates features over time. You get some basic features like autohelp/autousage message, default value, setting an option as required, setting number of expected arguments, as well as subparsers for subcommand support (although not yet nested in Getopt::ArgParse). The type/validation feature is weak or almost nonexistent; perhaps a custom validation routine should be allowed to be specified or more can be explored here.

What's rather disappointing from this port is its use of Getopt::Long (I was expecting a full port so option parsing should be done by itself) and Moo, significantly adding dependencies.

There is mention of configuration file in the documentation, but actually there is no explicit support of configuration file. Not even using "option file prefix" ala argparse or Getopt::ArgvFile.

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

In the next few days, I'll be reviewing Perl ports of some popular option parsing modules from other languages. Today: Smart::Options.

Summary

Smart::Options is written by KAN Fushihara (MIKIHOSHI) and is a Perl port*) of node's optimist package, which in turns uses minimist as the option parsing engine and adds some stuffs, mainly the ability to generate usage/help message. Ironically, optimist is now deprecated in favor of yargs which is roughly the same as optimist but does its own parsing and adds features like bash completion.

You can get an idea of the sheer number of packages in npm, the CPAN equivalent in the node ecosystem, by looking at these numbers: compared to Getopt::Long's 1127 dependents, minimist has 5768 dependents. And it isn't even the most popular option parsing package. The most popular one on npm is currently commander (from the legendary TJ Holowaychuk) which has 12252 dependents! yargs has 4073, optimist has 3546 (remember that optimist has been declared as deprecated), and nomnom (another deprecated option parsing package) still has 510.

Currently there is no CPAN distribution depending on Smart::Options.

commander itself resembles Getopt::Long::Descriptive a bit more in its interface. I didn't find any Perl port of commander on CPAN though.

But I digress. Let's go back Smart::Options and optimist. As I said earlier, optimist is roughly equivalent to Getopt::Long::Descriptive. Except for one main difference: you are not required to specify any specification. Without any specification, the library will simply accept any option and put it in a hash. But remember that without specification, you cannot check for an unknown option or get auto-abbreviation.

Bundling of short one-letter options is supported, but if you don't provide specification the library cannot differentiate which short options require value and which ones don't: the library will simply assume that all short options are just flags which don't take value.

Another difference is the usage of OO and method chaining.

Usage

Here's how one would use Smart::Options in the simplest way (without any specification):

As you can see, the command-line arguments will be put in the _ key. And unlike Getopt::Long, it does not modify @ARGV.

One nitpick: the argv() or the parse() function (or method) can accept a list to parse options from array other than @ARGV, but since it accepts a list instead of arrayref, when you pass a zero-length array it will assume that you don't pass any array and so still defaults to @ARGV. This can be remedied, e.g., by accepting an arrayref instead.

Without options specification, it's not possible to declare an option to be required, repeatable, or as a flag. So let's add some specification:

BTW, some option parsing modules, including Smart::Options, still complain about missing –foo when we instruct it to show help message (–help), like shown above. I think this behavior is a bug and should be fixed.

Other features

*) I said earlier that Smart::Options is a port of optimist. It is actually more accurately a blend between optimist and Kan's older module opts. So beyond optimist, Smart::Options adds some more (quite substantial) features, which do not exist even in yargs or commander.

Validation. Like in Getopt::Long, you can add some validation. You can declare an option to accept Bool, Int, Num, Str, ArrayRef (this is similar to Getopt::Long's @ destination type to make option repeatable), HashRef (if say foo is declared as a hashref, you can specify –foo.key1 or –foo.key2 in the command-line and so on), or Config.

Configuration file. The last type, Config, is actually supposed to let you specify a filename to make the module reads an INI-like configuration file. But perhaps this configuration is misplaced and conflated, as this is not a type/validation configuration, and it is not per-option but global.

Coercion. This can be used to convert an option value which is scalar/string to, say, Path::Tiny instance.

Subcomands. This lets you support (nested) subcommands by adding a nested Smart::Options object inside another, like in Getopt::Long::Subcommand. For example:

DSL. If you don't like the chained methods syntax, there's Smart::Options::Declare which offers an alternative interface to declare an option one by one much like Moose's has. Although it doesn't seem to support declaring subcommands yet.

Performance

The startup overhead of Smart::Options is roughly the same as Getopt::Long::Descriptive, while the memory usage is higher.

Also to be noted is that Smart::Options does not use Getopt::Long but does its own parsing.

Verdict

I find optimist and yargs themselves don't offer any new feature not already existing in Getopt::Long or Getopt::Long::Descriptive (the completion feature can be done with shcompgen). But Smart::Options does offer some extra features like subcommand support and reading of configuration file. On the other hand, you lose some of Getopt::Long's features like: auto-abbreviation and custom handler (in Getopt::Long, you can assign a coderef to an option which can do anything, like printing a message early and exiting, or setting other variable or multiple variables, or whatever).

My problem with this module is the interface: method chaining has its uses (for example I find it convenient in some JSON module or in jQuery) but here it just distracts and make options specification visually convoluted. On the other hand, the DSL alternative interface is not complete (yet).

I personally would still reach for my Perinci::CmdLine most of the time. But I will prefer Smart::Options over App::Options (which is also covered in this mini-article series).

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

In the previous article I discussed App::Cmd, which is a nice, simple CLI framework that supports subcommands by requiring you to write a subcommand class for each subcommand you want to add. And it also lets you specify Getopt::Long::Descriptive command-line options directly so you can be as custom as Getopt::Long::Descriptive lets you to be. However, many high-level features are missing.

There exists many more CLI frameworks on CPAN, like there are option parsing libraries, some closer to App::Cmd (except, say, being Moo- or Moose-specific) while others try to provide more said features.

App::Spec is one module. It is closer in features to my Perinci::CmdLine with the main difference being that App::Spec is OO (although it uses a single class and different methods to support subcommands instead of a separate class for each subcommand) while Perinci::CmdLine is decidedly not. Here are the features that it supports (or want to support, as it’s not quite polished or finished yet): a specification for CLI app (summary/description, list of subcommands (possibly nested), and parameters/options for each subcommand), extra validation, automatic help/usage message generation, and shell tab completion. App::Spec is relatively new (2016) and written by Tina Muller (TINITA). No applications on CPAN are using it right now. There is actually an App::Spec article on this year‘s Perl Advent Calendar so I’ll just direct you to reading the article instead of describing it myself.

What’s good about App::Spec is that it does not use Moo or Moose, so you can use it for applications you want to be light. It’s also not too heavy on the OO side. It provides shell tab completion out of the box; we need more frameworks like this because tab completion is one of pillars of usability on the CLI. I hope the completion feature improves in the future.

What I find in App::Spec not really to my liking includes: low-level (manual) mapping to Getopt::Long specification format (I prefer an automatic mapper like in MooseX::Getopt or my Perinci::CmdLine), splitting options into “options” and “parameters” (unnecessary, they’re all options to me, “options” just happen to be boolean switches while “parameters” have values like string or whatever).

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Traditionally, option parsing modules in Perl like Getopt::Std and Getopt::Long
do not have the concept of subcommands. The rising popularity of CLI programs with subcommands, specifically git and other post-CVS version control tools, has prompted the option parsing libraries to include the concept too, like in node’s commander or Python’s argparse. In Perl, there are not many option parsing libraries that offer this feature (although ports of other languages’ libraries including argparse exist on CPAN, which I’ll cover in the following days). That said, you can also use a higher-level library like CLI frameworks that support subcommands.

App::Cmd is one such module: it is an OO CLI application framework which happens to use Getopt::Long::Descriptive as the command-line options parser. Both modules are written by Ricardo “Rik” Signes (RJBS). App::Cmd was first released in 2006 and is still being updated. Around 60 CPAN distributions use App::Cmd (90 if we also count users of MooseX::App::Cmd), making it possibly the most popular CLI application framework on CPAN. Toby Inkster (TOBYINK) even once called it “the PSGI of the command-line world”, although I don’t think that analogy is appropriate. A very popular CLI application, dzil (Dist::Zilla), also by Rik, uses App::Cmd.

As mentioned, App::Cmd is meant to be used to write CLI application which has subcommands (or commands, as it call them) like ‘git’ or ‘dzil’ (with ‘git clone’ or ‘dzil build’ as examples of command with subcommand). To use App::Cmd in your application, you need to create a single application class and then one class for each subcommand you want to support. App::Cmd does not use Moo or Moose, making it more universally usable. Of course, your application or command classes can be Moo-based or Moose-based, as demonstrated by MooseX::App::Cmd and dzil. The CLI script itself is reduced to something like:

use YourApp; # your application class
YourApp->run;

Accepting and processing command-line options is pretty direct, if not low-level:

You provide a method opt_spec in your command class to specify which command-line options your subcommand will accept. The value returned by this method will be passed directly to Getopt::Long::Descriptive. The parse result will be $opt object (which is something you normally get from the Getopt::Long::Descriptive’s describe_options function) as well as $args (the remaining command-line arguments from @ARGV after the options has been stripped from).

You can also provide the usage description to be passed to Getopt::Long::Descriptive’s describe_options via the usage_desc method. And an additional method validate_args to further validate $opt and $args if needed. The main method in a command class is execute, which is fed $opt and $args.

So you can see this does not differ much from a “traditional” CLI using Getopt::Long. You still provide the command-line options specification manually. This differs from other CLI frameworks like MooseX::Getopt or my Perinci::CmdLine which try to be more DRY by directly setting object attributes or function arguments from command-line options.

Another thing to note is that no configuration file support is baked in: you need to read and parse configuration files yourself. So basically what App::Cmd provides is the structure. For getting many higher-level CLI features, you need to do on your own. App::Cmd is mature and widely used, but you might also want to take a look at some other CLI frameworks that do more stuffs for you.