The Python Requests Module

Introduction

Dealing with HTTP requests is not an easy task in any programming language. If we talk about Python, it comes with two built-in modules, urllib and urllib2, to handle HTTP related operation. Both modules come with a different set of functionalities and many times they need to be used together. The main drawback of using urllib is that it is confusing (few methods are available in both urllib, urllib2), the documentation is not clear and we need to write a lot of code to make even a simple HTTP request.

To make these things simpler, one easy-to-use third-party library, known as Requests, is available and most developers prefer to use it instead or urllib/urllib2. It is an Apache2 licensed HTTP library powered by urllib3 and httplib.

Installing the Requests Module

Installing this package, like most other Python packages, is pretty straight-forward. You can either download the Requests source code from Github and install it or use pip:

If you don't receive any errors importing the module, then it was successful.

Making a GET Request

GET is by far the most used HTTP method. We can use GET request to retrieve data from any destination. Let me start with a simple example first. Suppose we want to fetch the content of the home page of our website and print out the resultin HTML data. Using the Requests module, we can do it like below:

It will print the response in an encoded form. If you want to see the actual text result of the HTML page, you can read the .text property of this object. Similarly, the status_code property prints the current status code of the URL:

requests will decode the raw content and show you the result. If you want to check what type of encoding is used by requests, you can print out this value by calling .encoding. Even the type of encoding can be changed by changing its value. Now isn't that simple?

Reading the Response

The response of an HTTP request can contain many headers that holds different information.

httpbin is a popular website to test different HTTP operation. In this article, we will use httpbin/get to analyse the response to a GET request. First of all, we need to find out the response header and how it looks. You can use any modern web-browser to find it, but for this example, we will use Google's Chrome browser.

In Chrome, open the URL http://httpbin.org/get, right click anywhere on the page, and select the "Inspect" option

This will open a new window within your browser. Refresh the page and click on the "Network" tab.

This "Network" tab will show you all different types of network requests made by the browser. Click on the "get" request in the "Name" column and select the "Headers" tab on right.

The content of the "Response Headers" is our required element. You can see the key-value pairs holding various information about the resource and request. Let's try to parse these values using the requests library:

We retrieved the header information using r.headers and we can access each header value using specific keys. Note that the key is not case-sensitive.

Similarly, let's try to access the response value. The above header shows that the response is in JSON format: (Content-type: application/json). The Requests library comes with one built-in JSON parser and we can use requests.get('url').json() to parse it as a JSON object. Then the value for each key of the response results can be parsed easily like below:

Third line, i.e. r.json(), printed the JSON value of the response. We have stored the JSON value in the variable response and then printed out the value for each key. Note that unlike the previous example, the key-value is case sensitive.

Similar to JSON and text content, we can use requests to read the response content in bytes for non-text requests using the .content property. This will automatically decode gzip and deflate encoded files.

Passing Parameters in GET

In some cases, you'll need to pass parameters along with your GET requests, which take the form of query strings. To do this, we need to pass these values in the params parameter, as shown below:

As you can see, the Reqeusts library automatically turned our dictionary of parameters to a query string and attached it to the URL.

Note that you need to be careful what kind of data you pass via GET requests since the payload is visible in the URL, as you can see in the output above.

Making POST Requests

HTTP POST requests are opposite of the GET requests as it is meant for sending data to a server as opposed to retrieving it. Although, POST requests can also receive data within the response, just like GET requests.

Instead of using the get() method, we need to use the post() method. For passing an argument, we can pass it inside the data parameter:

The data will be "form-encoded" by default. You can also pass more complicated header requests like a tuple if multiple values have same key, a string instead of a dictionary, or a multipart encoded file.

Sending Files with POST

Sometimes we need to send one or more files simultaneously to the server. For example, if a user is submitting a form and the form includes different form-fields for uploading files, like user profile picture, user resume, etc. Requests can handle multiple files on a single request. This can be achieved by putting the files to a list of tuples, like below:

As you can see the redirection process is automatically handled by requests, so you don't need to deal with it yourself. The history property contains the list of all response objects created to complete the redirection. In our example, two Response objects were created with the 301 response code. HTTP 301 and 302 responses are used for permanent and temporary redirection, respectively.

If you don't want the Requests library to automatically follow redirects, then you can disable it by passing the allow_redirects=False parameter along with the request.

Handling Timeouts

Another important configuration is telling our library how to handle timeouts, or requests that take too long to return. We can configure requests to stop waiting for a network requests using the timeout parameter. By default, requests will not timeout. So, if we don't configure this property, our program may hang indefinitely, which is not the functionality you'd want in a process that keeps a user waiting.

import requests
requests.get('http://www.google.com', timeout=1)

Here, an exception will be thrown if the server will not respond back within 1 second (which is still aggressive for a real-world application). To get this to fail more often (for the sake of an example), you need to set the timeout limit to a much smaller value, like 0.001.

The timeout can be configured for both the "connect" and "read" operations of the request using a tuple, which allows you to specify both values separately:

Here, the "connect" timeout is 5 seconds and "read" timeout is 14 seconds. This will allow your request to fail much more quicklly if it can't connect to the resource, and if it does connect then it will give it more time to download the data.

Cookies and Custom Headers

We have seen previously how to access headers using the headers property. Similarly, we can access cookies from a response using the cookies property.

For example, the below example shows how to access a cookie with name cookie_name:

The Session Object

The session object is mainly used to persist certain parameters, like cookies, across different HTTP requests. A session object may use a single TCP connection for handling multiple network requests and responses, which results in performance improvement.

The httpbin path /cookies/set/{name}/{value} will set a cookie with name and value. Here, we set different cookie values for both first_session and second_session objects. You can see that the same cookie is returned in all future network requests for a specific session.

Similarly, we can use the session object to persist certain parameters for all requests.

As you can see, the default_cookie is sent with each requests of the session. If we add any extra parameter to the cookie object, it appends to the default_cookie. "first-cookie": "111" is append to the default cookie "default_cookie": "default"

Using Proxies

The proxies argument is used to configure a proxy server to use in your requests.

For downloading or streaming content, iter_content() is the prefered way.

Errors and Exceptions

requests throws different types of exception and errors if there is ever a network problem. All exceptions are inherited from requests.exceptions.RequestException class.

Here is a short description of the common erros you may run in to:

ConnectionError exception is thrown in case of DNS failure,refused connection or any other connection related issues.

Timeout is raised if a request times out.

TooManyRedirects is raised if a request exceeds the maximum number of predefined redirections.

HTTPError exception is raised for invalid HTTP responses.

For a more complete list and description of the exceptions you may run in to, check out the documentation.

Conclusion

In this tutorial I explained to you many of the features of the requests library and the various ways to use it. You can use requests library not only for interacting with a REST API, but it can be used equally as well for scraping data from a website or to download files from the web.

Modify and try the above examples and drop a comment below if you have any question regarding requests.