6.1 Session and cookies

Sessions and cookies are two very common web concepts, and are also very easy to misunderstand. However, they are extremely important for the authorization of pages, as well as for gathering page statistics. Let's take a look at these two use cases.

Suppose you want to crawl a page that restricts public access, like a twitter user's homepage for instance. Of course you can open your browser and type in your username and password to login and access that information, but so-called "web crawling" means that we use a program to automate this process without any human intervention. Therefore, we have to find out what is really going on behind the scenes when we use a browser to login.

When we first receive a login page and type in a username and password, after we press the "login" button, the browser sends a POST request to the remote server. The Browser redirects to the user homepage after the server verifies the login information and returns an HTTP response. The question here is, how does the server know that we have access privileges for the desired webpage? Because HTTP is stateless, the server has no way of knowing whether or not we passed the verification in last step. The easiest and perhaps the most naive solution is to append the username and password to the URL. This works, but puts too much pressure on the server (the server must validate every request against the database), and can be detrimental to the user experience. An alternative way of achieving this goal is to save the user's identity either on the server side or client side using cookies and sessions.

Cookies, in short, store historical information (including user login information) on the client's computer. The client's browser sends these cookies everytime the user visits the same website, automatically completing the login step for the user.

Figure 6.1 cookie principle.

Sessions, on the other hand, store historical information on the server side. The server uses a session id to identify different sessions, and the session id that is generated by the server should always be random and unique. You can use cookies or URL arguments to get the client's identity.

Figure 6.2 session principle.

Cookies

Cookies are maintained by browsers. They can be modified during communication between webservers and browsers. Web applications can access cookie information when users visit the corresponding websites. Within most browser settings, there is one setting pertaining to cookie privacy. You should be able to see something similar to the following when you open it.

Figure 6.3 cookie in browsers.

Cookies have an expiry time, and there are two types of cookies distinguished by their life cyles: session cookies and persistent cookies.

If your application doesn't set a cookie expiry time, the browser will not save it into the local file system after the browser is closed. These cookies are called session cookies, and this type of cookie is usually saved in memory instead of to the local file system.

If your application does set an expiry time (for example, setMaxAge(606024)), the browser will save this cookie to the local file system, and it will not be deleted until reaching the allotted expiry time. Cookies that are saved to the local file system can be shared by different browser processes -for example, by two IE windows; different browsers use different processes for dealing with cookies that are saved in memory.

Set cookies in Go

Go uses the SetCookie function in the net/http package to set cookies:

http.SetCookie(w ResponseWriter, cookie *Cookie)

w is the response of the request and cookie is a struct. Let's see what it looks like:

Fetch cookies in Go

The above example shows how to set a cookie. Now let's see how to get a cookie that has been set:

cookie, _ := r.Cookie("username")
fmt.Fprint(w, cookie)

Here is another way to get a cookie:

for _, cookie := range r.Cookies() {
fmt.Fprint(w, cookie.Name)
}

As you can see, it's very convenient to get cookies from requests.

Sessions

A session is a series of actions or messages. For example, you can think of the actions you between picking up your telephone to hanging up to be a type of session. When it comes to network protocols, sessions have more to do with connections between browsers and servers.

Sessions help to store the connection status between server and client, and this can sometimes be in the form of a data storage struct.

When an application needs to assign a new session to a client, the server should check if there are any existing sessions for the same client with a unique session id. If the session id already exists, the server will just return the same session to the client. On the other hand, if a session id doesn't exist for the client, the server creates a brand new session (this usually happens when the server has deleted the corresponding session id, but the user has appended the old session manually).

The session itself is not complex but its implementation and deployment are, so you cannot use "one way to rule them all".

Summary

In conclusion, the purpose of sessions and cookies are the same. They are both for overcoming the statelessness of HTTP, but they use different methods. Sessions use cookies to save session ids on the client side, and save all other information on the server side. Cookies save all client information on the client side. You may have noticed that cookies have some security problems. For example, usernames and passwords can potentially be cracked and collected by malicious third party websites.

Here are two common exploits:

appA setting an unexpected cookie for appB.

XSS attack: appA uses the JavaScript document.cookie to access the cookies of appB.

After finishing this section, you should know some of the basic concepts of cookies and sessions. You should be able to understand the differences between them so that you won't kill yourself when bugs inevitably emerge. We'll discuss sessions in more detail in the following sections.