Category Archives: python

So recently one of my project allow me to have experience on gevent-socketio. While the code is open source, it can be complicated. Thus the example I show here, while not serious hopefully will make things easier to see.

Before we can start to create a view for socket-io you will need to define the namespace to be used

Ping Pong Namespace

Python

1

2

3

4

5

6

7

8

9

fromsocketio.namespace importBaseNamespace

classPingPong(BaseNamespace):

defon_ping(self,attack):

ifattack['type']=='fireball':

foriinrange(10):

self.emit('pong',{'sound':'bang!'})

else:

self.emit('pong',{'sound':'pong'})

The gevent-socketio library uses Namespace as the view logic, all the logics will be in a namespace, along with namespace, there is also Mixins that is built on top of namespace that provide extra functionality, gevent-socketio provide BroadcastMixin and RoomMixin, which is useful.

line 3: We create our namespace by subclassing Namespace Class

line 4: This is method defined by us. We prefix on_ to an event that we want to process. In this case it is the ping event. The event will be send from socketio client. The parameter attack, that is essentially the payload that we will be receiving, and we don’t need to do any conversion from json to dict, as the library handle it for us.

line 7 and 9: Self.emit will emit an event to the socketio client, again the dict, is the payload that will be sent, no converstion to json necessary.

Now we can finally define a view

socket-io view

Python

1

2

3

4

5

6

7

8

9

10

fromflask importFlask

fromflask importrequest

fromsocketio importsocketio_manage

app=Flask(__name__)

@app.route('/socket.io/<path:remaining>')

defpingpong(remaining):

socketio_manage(request.environ,{'/pingpong':PingPong},request)

return'done'

This is the view for socketio. app is just your standard flask app.

line 7: is the route necessary for the socketio client to connect

line 9: this is the same across socketio app, the difference here is the namespace ‘/pingpong’ and the namespace PingPong, that you will need to define yourself.

Now to serve this guy

1

2

3

4

5

6

7

8

9

from socketio.server import SocketIOServer

# app will be somewhere around here

def main():

SocketIOServer(('',5000),app,resource="socket.io").serve_forever()

if__name__=="__main__":

main()

line 6: this will be the line that serve the app. It will serve all the views include the non-socketio one.

I will skip the view to render the main page and the html, I will focus on the javascript alone.

the javascript code

JavaScript

1

2

3

4

5

6

7

8

9

10

11

12

13

14

$(function(){

varsocket=io.connect('/pingpong');

socket.on('pong',function(data){

console.log('pong')

console.log(data)

$('#result').append(data.sound+'<br/>');

});

$('.ping').click(function(event){

event.preventDefault();

attack=$(this).data('attack');

console.log(attack);

socket.emit('ping',{'type':attack})

});

});

line 2: This is how you define a socket io.connect(‘/namespace’)

line 10: We bind a javascript click an event, to send an event we use socket.emit(event,data). data will be a javascript object, event will be a string, notice the on_ping method on the Namespace we define on top.

line 3: As the socketio server emit a pong event, this will event handler and do something with it in callback. Again don’t worry about conversion, the data is converted into a javascript object.

Hopefully this will make things slightly clearer. Btw the example on github.

After the yesterday’s Python Malaysia meetup, here is a few thing I want to try up, or keep on using for future event

Malaysian tend to be late, so always put the time a bit earlier, at least 30 minute ahead, that is the usual time for Malaysian to be late

Half of the user do not attend, even though they have registered on eventbrite. What I want to try next time is, try charge money for the next event. The money will goes to pizza

The event is a bit bland, some suggest a full day just for Python User Group Meetup. I try to run it like most other python user group, one topic per meetup. Bigger event also harder to run. What I might test out, is having a lightning talk for each meetup.

We have our networking session in a mamak, we ends up using 2 rows of table. It is not a bad thing, though table kinda limit movement, maybe next time, we order pizza(Go to point 2). It is nicer for everyone have a chance to talk to each other

Python Malaysia need a proper website, not everyone using facebook. Even though most go from facebook event page to eventbrite page. Still it is a nice thing to have.

Location matters!!!!! ITrain is just the right place to have an event, in the middle of city, accessible via LRT. Car park can be a problem though. It helps to bring more people in, because of the location.

Here is some photo of Python Malaysia Meetup happen today. This meetup is about Urwid, a console UI library, from the creator himself, Ian Ward. And a few new faces is in the meetup as well. More detail coming soon. After seeing the demo, I can see myself using Urwid.

Recently I got involved in a Open Data Movement in Malaysia, and one of my recent project is called Bill Watcher. It is a webapp that broadcast via twitter and rss on bills that is being debated, and being passed recently.

Main Page

Bills Detail

Basically this page scraped from the malaysian parliament website. And load into a sqlite database now, which I don’t really care, because I am doing it via sqlalchemy for now, it make it easy to move to other database. Bottle.py just read from the database via sqlalchemy, and render it. Use 960 gs to make the rendered page look nice.

The feature of this app, is pretty small, the pdf is iframe, there is no login. Fancy sharing feature is via twitter and facebook button and RSS. Commenting will be provided by disqus, if I figure out where to put it. Javascript is only used on twitter and facebook button.

I consider this as MVP for this, small basic feature to be extended. So feature will be added as requested, but not all will be added. Also not a lot of information is available on the parliment bills page, so feature will be based on effort needed to extract it from other source, which actually not really easy. But otherwise, we will try our best to get feature to be added inside.

What next, we going to host it live soon. Then we will add disqus, then finalized twitter notification. To get your hand dirty now. Go to the github link

I will transfer to the sinar repo soon. Need to do a bit update across repo.

p.s

Recently a bill being debated intensely, shows that how many stuff we don’t know about the decision process in the country, even though it is there on the parliament side. Which does not make it easy for use nor navigater around.

Not too long ago, I covered virtualenv. One of the behavior is the –no-site-package option which will totally isolate the python environment. Starting from version 1.7, –no-site-package will be the default option, so you don’t need this flag anymore. If you invoke virtualenv with this flag, it will display an error.

So I have been scraping data online for sometime. While scraperwiki have an API that allow third party app to get data in json/xml form. I think I can make it easier, because scraperwiki query involve doing a sql query on the sqlite datastore. Thus I take the opportunity to learn new python web framework.

The framework only need to handle request, and spit data in json(maybe xml later). It does not need a template, it is json. It don’t need an ORM, the data most probably scrape from somewhere else. It do not need session, it is meant to be use by library. The data is open anyway.

The first thing I notice is the amount of setup that I have to do, coming from a django background. Which is well known for the big settings.py file. The amount of setup is small. Just install using ‘pip install bottle’.

By default bottle already have a default application, so you don’t strictly need it, I just to put it there to show that it is there.

Another thing I have noticed is, there is no url route in a separate file. A route decorator is added to a function that I want to serve in the web app. The route is part of the application(the Bottle() object), and I can limit the type of request I can do on it, like POST/GET. I found that this approach is pretty clean, it reduces the boiler plate like in django views.

Another thing to notice is. I do not specify a response method/object(like django). That is another nice thing about bottle. If the function returns a dict, the response will be in json. If string then the mimetype is text, etc. There is no need to specify a function for response.

Finally to run the app, just run python server.py (or any python file with the bottle run function). You have an webapp.

For this project I didn’t test the template, but from the doc, it is specified with a view decorator, which I think is nice, but I don’t need it now. From the doc, I found that it is pretty clean.

Because bottle is a micro framework, there is no manage.py script like django, no ORM, I uses sqlalchemy here. There is no session support too. But interestingly I don’t feel that I missed anything. In fact, it is pretty pleasant to use. Though session will definitely bite me if I ever have to implement login, but solution is on the documentation.

Overall, it is a fun framework to use, even though this is a small project. The documentation is pretty good. I might use it for future project.

This is more of a experience. Not too long ago, I have scrape from the parliament website on profiles of Member’s of Parliament, you can find the result here.

The thing is, as I use the data from the sqlite database, I download from the site, I realized that, the Title is part of the name of the MP’s. So one would get “XXX , Y.B Tuan”. Y.B Tuan is the title.

That would make query like ‘select Parti from swdata where Nama=name’ hard. Because this is precisely what I am looking at, for another project.

On the other hand, sqlite3 module, apart comes with python standard library since 2.6. Actually have a function called, Connection.create_function.

So I wrote a little function called get_name, and the example show how it works.

I have been writing scraper for sometime, as you can see in some of my old post here.

So recently thanks to Kaeru, introduced to me, scraperwiki. This is basically a service for you to run scraper on the cloud, with additional benefits:

It runs on the cloud

It provide infrastructure to store the data, in form of sqlite database, which you can download.

It provide easy way to dump data as excel

It provide infrastructure to convert the data into API

Somebody can fork the scraper and do enhancement on it.

A web based IDE, so you just write your scraper on it.

Everybody can see the code of the public scraper.

Scheduled task

One very cool thing about scraper wiki is, it support a set of third large library that can be used. It support Ruby, PHP, as well as Python. The API for scraper wiki is pretty extensive, it both covers it’s own scraper, geocoding function, views for the data hosted on scraper wiki etc.

My only concern is, let say I want bring my scraper out of the service, I will need to rewrite the saving function. But on the the data can be downloaded anyway, and I use python, so it is not that big of a deal.

Below is a scraper that I have written, on scraper wiki. While it is mostly a work in progress, it show how it would look like.

Not too long ago, I covered one use of python dateutil, on the blog here.

The library itself is pretty nifty in other case as well. In this case date difference. While python datetime module in the standard library, the datetime.timedelta is used to find difference in date, it counts up to the days. In my case, I want to count it to years.

That is where dateutil comes it. It have a module called, relativedelta. Which do actually count to years. To use it is a matter of import and use it

There is time where there is information in govt website of is very useful, but unfortunately the data is in form of website, it could be worst as it can be in PDF. So it can be a pain if we wanted to use information for programming, but there is no API.

On the other python is a pretty powerful language. It comes with many library, include those that can be use to do HTTP request. Introducing urllib2, it is part of standard library. To use it to download data from a website can be done in 3 line of code

importurllib2page=urllib2.urlopen(“url“)

The problem, then is you get a whole set of HTML, which a bit hard to process. Then python have a few third party library, the one I use is Beautiful Soup. Beautiful Soup is nice that it is very forgiving in processing bad markup in HTML. So you don’t need to worry about bad format and focus to get things done. The library itself can also parse XML, among other thing.

To use it, just download the data using urllib2 and pass to to beautiful soup. To use it is pretty easy, to me anyway. Though, urllib2 is going to be re organized in python 3. So code need some modification.

To see how the scraper fare, here is a real world example, in github part of a bigger project. But hey it is open source. Just fork and use it, in the this link.

So enjoy go forth and extract some data, and promise to be nice, don’t hammer their server.