Tag: urlfetch

Here’s a code snippet demonstrating how to generate a POST request using the low level urlfetch API.

The variables post_url and post_data represent the request URL and the content of the POST as strings. The response from the urlfetch is stored in response_content . If the request was unsuccessful (the target server returned a non-200 HTTP status code) a RuntimeException will be thrown.

Google caches the results of URL Fetch requests so subsequent requests can be supplied from the cache, thereby speeding up the request and your application in general.

This can be troublesome though, especially if an application is accessing a web page that changes quickly; URL fetch may be returning stale results without the application understanding this. Fortunately, there’s a way to detect whether or not the page was retrieved from a Google cache server.

For all URL Fetch requests from the production App Engine servers, the X-Google-Cache-Control header is added to URL Fetch responses. If the header has a value of remote-fetch, then the fetch retrieved a fresh copy of the page. If the value is remote-cache-hit, then the page was retrieved from Google’s cache and may have stale data.

As I commented in a previous post, the App Engine production environment automatically sets an User-Agent (listed below) to all URL Fetch requests. If you set a custom user agent, App Engine will append the below text to your custom header.

However, the development server doesn’t add this header automatically. If you set a custom User-Agent header, that’s all that will be sent – no other identifying information. If you don’t set an user agent, URL fetches from the development server will not have any user agent information.

This can be an issue while developing applications in the dev server; some APIs require the existence of this header, and will refuse to respond or heavily rate limit requests if this header is missing. For instance, the NewsBlur API requires an user agent header for all requests. If the request doesn’t contain an user agent header, the API will refuse the request even if it’s authenticated.

Always set a custom user agent header which accurately describes your application to all URL fetch requests. If your application does a lot of URL fetches to the same API/server, it may be a good idea to list your email address or a web page with more information about your application.

Even if the application sets a custom user agent header, App Engine will append the above text to the header.

This can be annoying because there are some servers and services that rate limit based on the user agent. If there is a human reviewing the request logs, it can be confusing to see a stream of largely-identical user agent strings.

It’s good practice to set a descriptive user agent for all URL fetches. It’s even better if you can write your user agent with App Engine’s required text in mind. For instance, consider writing user agent headers like this one: App Engine Example Crawler hosted by. When App Engine appends its required text to the end of this, the receiving server will see an user agent of:

Here’s a short code example showing how to do a HTTP GET using the low level Java API.

The variable url_string_here is the URL being retrieved as a String. It returns a byte[] array containing the content of the response. If the response code is not 200 (i.e. anything other than HTTP OK) then this code throws a RuntimeException.

Occasionally applications – even the best behaved applications – will get the error “Deadline exceeded while waiting for HTTP response from URL.”

Generally, this means that the web service you’re trying to connect to is down or slow. If the service is down, then you can continuously retry your URL fetches by queuing them up within a task.

If the web service is slow, then you have an alternative: setting the read and connect timeouts to a longer timeout point. By default, App Engine expects that an URL fetch will take – at most – 5 seconds. That’s 5 seconds to connect to the web service (resolve DNS and so forth), send the request data, allow the web service to process the request, and finally retrieve any response sent back. For the vast majority of applications, that’s more than enough. The popular web APIs such as Twitter, Facebook, Google, etc all process and return requests in much less than 5 seconds.

However, a slow or malfunctioning web service may take longer than 5 seconds to respond to a query. If your app is downloading a large amount of data (more than a few MB) you may also go past this limit. To tell App Engine to wait for a longer period of time, use this code (url_connection represents a HttpURLConnection object):