This post covers everything you need to know about how to create and interpret command lines for Windows programs.

I’ll go over how to quote and escape arguments including those that contain spaces, embedded quotes and special characters. I’ll explain how the command line is received and interpreted. I’ll point out specific situations where problems are likely to occur and give you some tools and techniques to ensure your programs always receive exactly the arguments you intended.

Introduction

Understanding the command line is important, especially when you don’t have control over your program’s input. Arguments could come from a database or a directory listing, or be entered by a user of your software. The information here will help you correctly handle situations you didn’t anticipate and avoid having a program or script intermittently fail. I will point out the most common mistakes and explain what to do in those rare situations where there isn’t a clean solution.

Some of the information here comes from other sources as well as my own testing. I present a new way to think about the command line that I hope will eliminate some confusion about how double quote characters are interpreted by the command line parser.

That’s a lot of ground to cover, so I plan to present it over time in multiple posts. By the time I’m done you will thoroughly understand:

What I mean by the terms quoting and escaping, and the different contexts where arguments need to be quoted or escaped

How an executable program receives its command line and splits it into individual arguments

How the command line is interpreted and modified by by cmd.exe before passing it to an executable program

Batch file considerations

Passing received arguments on to other programs and scripts

The various problems and security risks that can occur and how to recognize and prevent them

I’ll also provide some free tools to help you do things the right way.

Technical Notes

The following will help you understand this series better.

I use the term “literal” to describe any text that is intended to be passed unchanged to an application. This is in contrast to special characters such as whitespace, used to separate arguments, or double quote [”] and escape [\, ^] meta characters that control parsing but are not passed to the application.

When giving examples I needed a way to clearly show the content of a piece of text, especially if it has leading or trailing spaces. Enclosing these in quotes is ambiguous when the text itself contains the quote character. To avoid confusion I enclose literal strings inside square brackets:

[3 backslashes, followed by double quote: \\\”]

None of my sample text contains literal square brackets, so they always indicate the content of a literal string or a single character. Any quote character (single or double) or backslash that appears inside square brackets is always part of the sample text.

I include images to help visualize how various example command lines are parsed. Hopefully they’re intuitive, but I included a description of the images. You can also click on any of the images to get to the description page

This series of articles was written from the perspective of a C/C++ Windows programmer so there may be a noticible bias in the presentation. The information applies mainly to programs developed with Microsoft development tools (Visual Studio 2010) and which use the Microsoft standard runtime library.

2 Comments

admin says:

I wasn’t sure if anyone was being helped by this. It was fun, but a lot of work to produce the content. If anyone else finds this useful, please comment and maybe I will pick this up again if I see any interest.

It’s too bad that you didn’t continue posting on this blog – this collection of posts on parsing Windows command lines is masterful! Your description of the special parsing of the first argument is something I haven’t seen anywhere else, yet I needed it today. Thanks! — David Bakin