Question

I'm creating a command line parser and want to support option bundling. However, I'm not sure how to handle ambiguities and conflicts that can arise. Consider the three following cases:

1.

-I accepts a string

"-Iinclude" -> Would be parsed as "-I include"

2.

-I accepts a string
-n accepts an integer

"-Iincluden10" -> Would be parsed as "-I include -n 10" because the 'cluden10' after the first occurrence of 'n' cannot be parsed as an integer.

3.

-I accepts a string
-n accepts an integer
-c accepts a string

"-Iin10clude" -> ??? What now ???

How do I handle the last string? There are multiple ways of parsing it, so do I just throw an error informing the user about the ambiguity or do I choose to parse the string that yields the most, i.e. as "-I i -n 10 -c lude"?

I could not find any detailed conventions online, but personally, I'd flag this as an ambiguity error.

Was it helpful?

Solution

As far as I know, there is no standard on command-line parameter parsing, nor even a cross-platform consensus. So the best we can do is appeal to common-sense and the principle of least astonishment.

The Posix standard suggests some guidelines for parsing command-line parameters. They are just guidelines; as the linked section indicates, some standard shell utilities don't conform. And all while Gnu utilities are expected to conform to the Posix guidelines, they also typically deviate in some respects, including the use of "long" parameters.

In any event, what Posix says about grouping is:

One or more options without option-arguments, followed by at most one option that takes an option-argument, should be accepted when grouped behind one '-' delimiter.

Note that Posix options are all single character options. Note also that the guideline is clear that only the last option in an option group is permitted to be an option which might accept an argument.

With respect to Gnu-style long options, I don't know of a standard other than the behaviour of the getopt_long utility. This utility implements Posix style for single character options, including the above-mentioned grouped option syntax; it allows single character options which take arguments to either be immediately followed by the argument, or to be at the end of an (possibly singular) options group with the argument as the following word.

For long options, grouping is not allowed, regardless of whether the option accepts arguments. If the option does accept arguments, two styles are allowed: either the option is immediately followed by an = and then the argument, or the argument is the following word.

In Gnu style, long options cannot be confused with single-character options, because the long options must be specified with two dashes (--).

By contrast, many TCL/Tk-based utilities (and some other command-line parsers) allow long options with a single -, but do not allow option grouping.

In all of these styles, options are divided into two disjoint sets: those that take arguments, and those that do not.

None of these systems are ambiguous, although a random mix of styles, as you seem to be proposing, would be. Even with formal disambiguation rules, ambiguity is dangerous, particularly in console applications where a command line can be irreversible. Furthermore, contextual disambiguation can (even silently) change meaning if the set of available options is extended in the future, which would be a source of hard-to-predict errors in scripts.

Consequently, I'd recommend sticking to a simple existing practice such as Gnu, and to not try too hard to interpret incorrect command lines which do not conform.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top