What sucks, who sucks and you suck

Olympic Curling

2010-02-22

Debugging web apps with Curl

It’s the Swiss Army knife of web app debugging. It avoids wondering how much of what you’re seeing is current, and how much has been cached by the browser. It shows you all the gory details of HTTP requests and responses. It’s curl, and it’s solved more web application issues for me than whole departments full of developers, analysts and managers.

Where can I get it?

It’s in most recent UNIX distributions, including virtually all Linux releases, and you can download native Windows versions from the web site. Note that it’s a command line tool.

Why should I use it?

Because it allows you to craft and HTTP request and contents by hand. Because it shows you the exact, entire, untranslated response from the remote server, without any client caching effects (the Live HTTP Headers add-on for Firefox gives you some of this, and is a worthwhile adjunct but not a replacement). Because it can handle almost any URL format and authentication mechanism you can throw at it. Because you can script it.

How do I use it?

Simplest use case:

$ curl -v http://servername/whatever

(-v shows the request and response data as well as the content.) curl --help will show all the options, or consult the documentation.

(Note that the password used will be in cleartext on the command line and in the process list; to prevent this, put the options and values in a file and use -d @file to read it instead.) The -D option causes the login cookie returned by the site (for subsequent authorized requests) to be stored in the cookies.txt file, where it can be reused by adding -b cookies.txt to subsequent invocations of curl. Hence, you can pass the login page and send further requests to navigate around the “secure” area of the site. It’s probably easiest to include -D and -b options on every invocation, with a single “cookie jar” filename.

What about more complicated login forms?

Some forms ask for certain characters from a known password instead. It is usually possible to extract the requested positions from the login page content, with some manual analysis and scripting using (on a UNIX or Cygwin host) grep, AWK, cut and friends. But it is feasible.

What about SSL?

If built correctly, curl supports HTTPS URLs transparently. Where invalid or self-signed certificates have been used, you will need to use the -k option to bypass the normal CA certificate bundle checks (as would be applied by browsers).

How do I handle name-based virtual hosts?

If you’re dealing with a name-based virtual host but have to use a particular IP address or other alias (and hence the server name in the request doesn’t match the required vhost name), add a Host: header with the correct vhost name using the -H option:

curl -H 'Host: somevhost' http://10.0.0.2/thing

(The other option is to add an alias to your local hosts file for that IP, but that can catch you out later when it’s no longer required.) You can also use -H to add other arbitrary HTTP headers, such as Cache-Control.

Can I derive stats from the responses?

Look at the -w option; it can be used to output various timing and other variables, such as response time and size, return code, download speed, etc. For example, to output some useful stats in a format suitable for graphing by RRDtool:

This will generate a tab-delimited line containing the epoch time (obtained using Perl), connect time, total transfer time, download speed, download size and response code. Log these to a file and it can be fed into an RRD graphing tool like Orca. (The actual response content is thrown away, but can be kept for analysis if desired.)

Yes, but how do I actually debug an app?

Use curl to feed in test values or pull particular URLs or URL sequences. Script a “typical” (short) user session and run it in various combinations either singly or in parallel to see how the site behaves. Check what’s coming back, both in the response headers (e.g. bad cache settings, cookies, timestamps or encodings) and in the content source (e.g. invalid hidden field values, incorrect CSS file paths, etc.). Any time you suspect that what’s being returned isn’t what you think (or are told) should be returned, use Curl to verify it.

What else?

Avoid proxies between your curl client and the server; either run curl directly on the server if possible or use a directly-connected host. Sometimes proxies are transparent (outgoing port 80 traffic is silently redirected via the proxy), and you can’t guarantee that your ISP isn’t doing this. It might not be a broken proxy and it might not matter, but the only way to be sure is to eliminate it if at all feasible. In particular, avoid any local ISA or similar filtering proxies. If you can’t avoid them, bear them in mind and treat the responses with appropriate caution.