How to Use The Urllib Python Library to Fetch URL Data and More

I will claim it. If you’re not using Python, you are losing a lot. At that point, i understand that some of you will argue that with saying how nice perl is or how great things work at ruby on rails. Well, get to know all scripting worlds if you like. Then, come back to Python programming.

I’m not trying to preach you about how good Python is (hope you know that already :P), but your being knowledgeable and using other scripting languages is fair enough. Bear with me on that though. It would be a really really good idea to take up Python if you’ve never done that before. I can assure you that it will be well worth it. Alas, if you wanted more preaching you could have read the 5 reasons why you should learn Python, you say angrily. You’re right, sorry.

To the point now. One of the very good things about Python is that you can find lots of premade libraries for all sorts of things that you may want to do. I must admit that the documentation is most times not as good as Perl’s very organized CPAN, but it still consists of some great reusable code that makes your job much easier.

In your scripting endeavours, there will be many times where you will need to fetch some data from a website. In my occasion, one of the best uses for that, was creating a script that automated the login and whole playing procedure for a well known browser game. YES, i am a cheater. But come on i did it for practice (not really, but i have to defend myself somehow).

In that occasion, i needed to do two things. First, i needed to log in to the game. Then, after doing that, i needed to grab certain webpages and send back certain requests that would automate procedures like building army, new constructions and such. Generally speaking, getting a premium service at my own expense. The tricky part about it is cookies. The cookie, in that sense, is a way for the browser to know when your session with the server ends. Thus, if it notices inactivity for a certain period of time (usually 15 minutes), your session expires and you get a message like “please log in”.

Therefore, you need to be using urllib in such a way that it creates one or more cookies as well and handles them. But before messing with the cookies, let’s first check a much simpler example of using urllib to fetch data from a certain URL, without having to create any cookies:

This is a very simple but effective example. The first thing that we need to do is call the urllib2 function named Request(). We invoke it using the url as a parameter and we get a request object back. This is used in its simplest form in the meaning. You could also be specifying http headers and data, but we will keep it simple for this first example. Then, you need to call the urlopen function that returns a file object to the actual received that. After that, you simply use the standard python read() function to read the whole file’s contents, get that and print it out.

Messing With Form Parameters, Cookies and HTTP Headers

While this is a good example of the core use of urllib2, it’s not used at its full potential here. Let’s see a more complicated example :

This is the actual login function (a bit edited), that i was using for the script i mentioned beforehand. It’s a bit more complicated, but it resembles a real world script. Let’s descramble it together. Notice that this time we use two more parameters for the Request function of urllib2. The first one is the url data. Since we want to login to the game, we have to provide our credentials. Our name, password and the server at which we play. You can get the names of the actual form elements easily using firefox’s great Web Developer plugin.

As you see, opts is actually a dictionary with the names and values of each important parameter that we need to pass to the server. This dictionary is then passed to urlencode that does what it says. It url encoded the data to properly be passed onto the server. Then, to make this a bit more professional, we disguise ourselves to be Mozilla Firefox. Nobody would want to suspect that we are a bot right ? Firefox headers look much more casual, don’t you think ?

After that, the procedure is almost the same. We just use ClientCookie as a means to get the webpage and invoke urlopen. What this does is allow python to store a session cookie for us. This is it now. We are in a situation like being logged in the actual game and we can do pretty much everything now

Multiple Cookies Handling

There has been a situation for me, when i needed multiple cookie handling and ClientCookie was just not doing that. I thought that it would be good to let you know of that occasion as well. Thus, this is the final example of this tutorial, that specifies how you can go about handling multiple cookies :

As you see, this is pretty different than the previous examples. We use cookielib to create what is called a cookie jar that will store multiple cookies for us. Then, build_opener() is used to combine the http handler with the cookies handler and after that things are back to normal. Using the opener object, we just execute a simple open to that obect in order to get a file object with the response of the server.

Make sure that you use this in a good manner and don’t go about creating bots ! Or maybe you think it’s a good way to practice, huh ? I second that.

This is my first time i visit here. I discovered so many fascinating stuff in your pages especially its discussion. From the tons of comments on your articles, I presume I am not the only one having all the enjoyment here! Keep up the good work.

what if the information was in javascript ? Like there are a table of statistics of, for example, basket ball player results. And this table was done in javascript, how could i fetch the info in that table and save the values i wanted to the variables i created ?

If you wanted to parse some data first based on what the server sends back to you, you would have to use a packet sniffer like wireshark. You would then inspect the packets to understand what data is returned. Then, you would be able to parse that data the way you are.

@borja : Not at all. What you need to do is fire up wireshark and start executing what you want to inspect, in your browser. The requests that you send to the server and the responses that the server sends to you are what you will be getting from wireshark.

Then, you check the packets that you receive (the data) and find out what requests were made and what responses were returned to you. You’re effectively analyzing the protocol. If the server returns stuff to be displayed via javascript, you will see everything there.

If, on the other hand, the javascript is predefined and there is no messing with the server, please take a look at :

excuse my kindergartenesque computer knowledge but I have a podcast and I am trying to understand my statistics. In trying to understand the statistic on USER AGENTS (?) it has Python-urllib with 10660 Downloads and iTunes with 1809 Downloads. I don’t know what the heck Python-urllib is, means, etc. Is there a way to explain it so a 46 yr.old NOT computer literate person (me) can understand?
Thanks

nooj, Python urllib is just a python library used to fetch url data. So, if you visit a website using python, you can get everything that a person sees to a program and perform anything you want with that data.

Most Ball pythons do that from what ive osrvbeed, all of my Ball-y`s Like to climb, my youngest Female sits on top of her Hide all day sunning herself. It really depends on how large your snake is. The Tank does not have to be too large, because Ball pythons actually prefer to have smaller space, i think it makes them feel safe. You may want to try putting a Climbing branch or something like that into her tank, you can get one at most pet stores, but i wouldn’t suggest finding one outside and putting it in the tank unless you sterilize it first.

Hi,
I want to create a script which opens browsers like Firefox, IE, Opera, Safari and Chrome etc at same time, or in a loop then opens a local html file, read the content in a frame and compares browser results to each other.
Could you please direct me if possible in brief.

I am pretty new to Python, I will be so greatful to you if you could help me on this.

Hey dude, I have got a question. Is a program possible which logs into my Facebook profile and then search randomly for people and or say girls, then visits their profiles and downloads their profile pictures?

Attractive component of content. I just stumbled upon your weblog
and in accession capital to assert that I acquire actually loved account
your blog posts. Anyway I will be subscribing for your feeds or even I success you get entry to persistently fast.

I like the valuable information you provide in your articles.
I’ll bookmark your blog and check again here frequently.
I am quite sure I will learn many new stuff right here!
Best of luck for the next!

Abour CoderCaste

Welcome to CoderCaste, a blog devoted in helping novice and seasoned programmers with their daily programming needs. We hope that you have a good time while around here and we will keep on creating fresh and interesting computer programming related content for thirsty developers. Enjoy !