getopt Example - How to Access & Parse Command Line Arguments

This is an article on getopt Example - How to Access & Parse Command Line Arguments in C.

A lot many times you would have used a Linux command with some arguments to it.

Like :

Code:

cp -r /home/user/Desktop/abc /home/abc

Code:

rm -rf abc

In the above examples, 'cp' or 'rm' are the name of binaries (written in 'C') while the rest of the stuff is the command line arguments to the respective 'C' program.

Ever wondered how these command line arguments are handled in the code ? Lets discuss it here...

Accessing command line arguments

You would have seen the signature of main() function being used as :

Code:

int main(int argc, char *argv[])
OR
int main(int argc, char **argv)

Well, the first argument in the above shown declaration is the number of command line arguments (including the name of the binary). While the second argument is the array of pointers containing the base addresses of the strings of arguments.

So, we can see that if we know how to parse the array of pointers 'argv', then we can easily access all the command line arguments. Lets understand this concept through an example. Here is a piece of code explaining this :

In the above piece of code, we iterate over each argument and print it. We keep on iterating until all the arguments are printed.

Here is the output :

Code:

$ ./cmdline
Argument number [0] is [./cmdline]

We see that since there was only one command line argument './cmdline', so our program printed only one argument. This shows that the name of the binary is included in the list of command line arguments to main().

Lets give some more arguments :

Code:

$ ./cmdline testarg1 testarg2 testarg3
Argument number [0] is [./cmdline]
Argument number [1] is [testarg1]
Argument number [2] is [testarg2]
Argument number [3] is [testarg3]

Here we see that we gave 3 more arguments in addition to the name of the binary and our program was able to print all the arguments.

Now, let us try once more and give some more familiar type of arguments(like we do give with some Linux commands) :

Code:

$ ./cmdline -a abc -r -d pqr
Argument number [0] is [./cmdline]
Argument number [1] is [-a]
Argument number [2] is [abc]
Argument number [3] is [-r]
Argument number [4] is [-d]
Argument number [5] is [pqr]

The output was as expected!!!!

So, now we get a basic Idea about how to access command line arguments inside the code.

Parsing the command line arguments

Since now you understand how to access the command line arguments, can you think over the parsing mechanism ?

Lets try the most basic one. Suppose we want to write a program which expects 3 arguments in the form :

<name of the binary> <operation> <val1> <val2>

<operation> : could be any one of 'add', 'subtract', 'multiply', 'divide'
<val1> : First numeric value
<val2> : Second numeric value

First the logic make sure that there are sufficient number of arguments, else we return error.

Next the logic fetches the second command line argument (first being the name of the binary) to know the operation intended by the user.

Then it fetches the value arguments

Now, based on the operation intended, the logic carries out the operation and prints the result.

I tried all the operations and here is the output :

Code:

$ ./cmdline add 2 3
The request operation was to add and the result is [5]
$ ./cmdline divide 2 3
The request operation was to divide and the result is [0]
$ ./cmdline multiply 2 3
The request operation was to multiply and the result is [6]
$ ./cmdline subtract 2 3
The request operation was to subtract and the result is [-1]

A practical problem

Parsing command line arguments seems good until now. But what if user gave the command line argument in the following way :

Code:

./cmdline 20 10 add

Lets run the command and see the output, here is the output I got :

Code:

$ ./cmdline 20 10 add
Wrong option

So, the program gave an error of 'wrong option' or rather the logic wanted to say that it did not find a valid operation as argument with index '1'. As the argument with index '1' here is 20 which is val1.

People may argue that the program behaved correctly as operation name should be given as first argument after binary name but what if I say that I want to give my users a flexibility where in they can write the three arguments (after binary name) in any order they want???

Any Ideas, about the question that I brought up?

Solution 1 - My Logic

Well, I know going with the current logic it becomes difficult.

One idea strikes in mind that how about making the binary run in the following way :

<binary-name> -o <operation> -v1<value1> -v2 <value2>

The above way sounds good and yes, this is the way we have been using standard Linux commands. Lets tweak our code to make it compatible with above kind of command line arguments. Here is the logic :

I tweaked the code so that it now accepts arguments in the standard way.

Through this logic now I have given the flexibility to users for specifying arguments in any order.

Now, I tried to run the above code with different orders in which arguments can be supplied by the user.

Here is the output :

Code:

$ ./cmdline
Usage : <binary-name> -o <operation> -v1 <val1> -v2 <val2>
$ ./cmdline -v1 20 -o add -v2 10
The request operation was to add and the result is [30]
$ ./cmdline -v1 20 -o multiply -v2 10
The request operation was to multiply and the result is [200]
$ ./cmdline -v1 20 -v2 10 -o multiply
The request operation was to multiply and the result is [200]
$ ./cmdline -v1 20 -v2 10 -o divide
The request operation was to divide and the result is [2]
$ ./cmdline -v2 10 -o subtract -v1 20
The request operation was to subtract and the result is [10]

Solution 2 - Using getopt() function

Well, to achieve what solution-1 above did achieve. The 'C' library provides a built-in function getopt().

This function is defined in unistd.h and the signature of this function is as follows :

Code:

int getopt( int argc, char *const argv[], const char *optstring );

The first two arguments are same as the arguments that main() function receives while the third argument is a string which is cooked in a special way so that getopt() function can understand which argument is a token (like '-o') and which is not(some arguments are not like '-o <value associated>', they are just like '-a').

The token which expects some value is followed by a ':' in the optsring.
So, an optsring "a:b:c" would signify that -a would expect some value, -b would expect some value while -c would not expect some value.

When this function is called each time, it returns the next argument and sets some global variables.

* optarg -- A pointer to the current option argument, if there is one.
* optind -- An index of the next argv pointer to process when getopt() is called again.
* optopt -- This is the last known option.

In our case, since all the three tokens expect some values, so we have kept the optstring to be : "o:x:y:"

Each time we call this function, it returns the token (like 'x') and sets the value of 'optarg' to the value of token

So, this way, this function makes our work easy.

Lets look at the output :

Code:

$ ./cmdline -o add -x 20 -y 10
The request operation was to add and the result is [30]
$ ./cmdline -o divide -x 20 -y 10
The request operation was to divide and the result is [2]
$ ./cmdline -x 20 -y 10 -o add
The request operation was to add and the result is [30]

Conclusion

To Conclude, this article explained how command line arguments are accessed in code and how getopt() function helps us to parse them easily. This getopt() function is used in most of the command line utilities that we use in our day today Linux work on terminal.