Category Archives: Web

Web scraping is where a programmer will write an application to download web pages and parse out specific information from them. Usually when you are scraping data you will need to make your application navigate the website programmatically. In this chapter, we will learn how to download files from the internet and parse them if need be. We will also learn how to create a simple spider that we can use to crawl a website.

Tips for Scraping

There are a few tips that we need to go over before we start scraping.

Always check the website’s terms and conditions before you scrape them. They usually have terms that limit how often you can scrape or what you can you scrape

Because your script will run much faster than a human can browse, make sure you don’t hammer their website with lots of requests. This may even be covered in the terms and conditions of the website.

You can get into legal trouble if you overload a website with your requests or you attempt to use it in a way that violates the terms and conditions you agreed to.

Websites change all the time, so your scraper will break some day. Know this: You will have to maintain your scraper if you want it to keep working.

Unfortunately the data you get from websites can be a mess. As with any data parsing activity, you will need to clean it up to make it useful to you.

I recently took on a project where I needed to graph some data on a webpage using data I had queried from a database. Since I love Python, I decided to use it to accomplish this task. I went with Flask for serving the webpage and pygal for creating the graphs. In this tutorial, I will show you how to do that too, but without the database logic. Instead, we’ll get weather data from the Weather Underground and graph that. Let’s get started!

Packt Publishing recently sent me a copy of the eBook version of Flask Framework Cookbook by Shalabh Aggarwal. I didn’t read it in its entirety as Cookbooks don’t usually make for a very interesting linear read. I just went through it and cherry picked various recipes. But before I get into too much detail, let’s do the quick review!

Quick Review

Why I picked it up: I was asked by the publisher to read the book.

Why I finished it: As already mentioned, I actually just skimmed the book and read random recipes

I’d give it to: Someone who is new to Flask or possibly an intermediate Flask developer

I don’t do a lot of plotting in my job, but I recently heard about a website called Plotly that provides a plotting service for anyone’s data. They even have a plotly package for Python (among others)! So in this article we will be learning how to plot with their package. Let’s have some fun making graphs!

One of my readers suggested that I should try logging my data to a web service called Loggly. As I understand it, Loggly is a way to share log data with everyone in a business so that you no longer need to log in to individual machines. They also provide graphs, filters and searches of the logs. They don’t have a Python API, but it’s still pretty easy to send data to Loggly via Pythons urllib2 module and simplejson. Also note that you can use Loggly for 30-day trial period.

I’ve been hearing some buzz about a newish web service called Twilio which allows you to send SMS and MMS messages among other things. There’s a handy Python wrapper to their REST API as well. If you sign up with Twilio, they will give you a trial account without even requiring you to provide a credit card, which I appreciated. You will receive a Twilio number that you can use for sending out your messages. Since you are using a trail account, you do have to authorize any phone numbers you want to send messages to before you can actually send a message. Let’s spend some time learning how this works!

Last night I received an email about a new Python-related Kickstarter. The Real Python crew added a new author to write a book entirely about Django 1.6. This is a subject that I keep meaning to get into and haven’t had the opportunity to do so. Hopefully by backing this project, I’ll finally learn Django.

I have been impressed with the quality of their previous projects, so I feel that I can safely endorse these authors. I’m sure the project will be of high quality and well worth your time and money. Plus it’s fun to support these guys who want to share their knowledge. If you’re interested in supporting the project you can go to the following address:

Today we’ll be looking at how to acquire data from the popular movie site, Rotten Tomatoes. To follow along, you’ll want to sign up for an API key here. When you get your key, make a note of your usage limit, if there is one. You don’t want to do too many calls to their API or you may get your key revoked. Finally, it’s always a very good idea to read the documentation of the API you will be using. Here are a couple of links:

Python has lots of web frameworks. Bottle is one of them and is considered a WSGI Framework. It’s also sometimes called a “micro-framework”, probably because Bottle consists of just one Python file and has no dependencies besides Python itself. I’ve been trying to learn it and I was using the official Todo-list tutorial on their website. In this article, we’re going to go over this application and improve the UI a little bit. Then in a separate follow-up article, we’ll change the application to use SQLAlchemy instead of straight sqlite. You will probably want to go install Bottle if you’d like to follow along. Continue reading Bottle – Creating a Python Todo List Web App→