Tatsuhiko Miyagawa's blog to discuss mostly tech and nerdy stuff.

2009.12.27

Whenever I come back to Japan i buy lots of books. Literally lots of them.

It'd be really nice if Japanese books are available for Kindle, or Sony Reader or whatever eBook device. The Japanese book industry is crap (like their movie/music industries are) and it probably needs another 5 years or so for them to finally adapt to the technology.

They also have their ridiculous no-repricing regulation that doesn't allow selling books in lower price, which I suspect breaks the antitrust law.

Watched one of my favorite movies of all the time last night -- though it was 2 days late for the Christmas: Love Actually.

It was a third time watching this but the last time was so long time ago, and i remembered most of the plots but forgot that most characters have relationships in their real life, like one character is a friend or a sister of another one, etc.

This is truly an epic movie and the soundtrack is pretty good, although it lacks many important tracks played during the movie, like Bay City Rollers' Bye Bye Baby.

2009.12.21

2009.12.20

So, Plack::Request is probably the most useful class/utility in the Plack/PSGI ecosystem but at the same time confuses people to think that Plack is more of a framework or a library for end users rather than for an application framework developer.

xSGI/Rack Request classes

Rack has Rack::Request which is a base class for Sinatra::Request and also is used in many Rack middleware components. WSGI doesn't have one unified implementation, so most middleware components are implemented in each framework, but there is also WebOb.py that is a parser for WSGI requests and provides Request objects.

Plack::Request is supposed to fit this layer as it's clearly documented in its POD. But it misses some functionalities like a) not being able to write new attributes through an accessor and b) there's no way to rewind psgi.input for later read. They are bugs, and should be fixed, but it's important to note that Request objects on this layer should be generic enough so it can work with any upstream frameworks, and it'd be nice if it can be subclassed in a new framework that doesn't have their own Request objects, such as Tatsumaki.

Framework Request classes

Most frameworks have their own Request classes: django, Rails, Merb, Sinatra, HTTP::Engine, and Catalyst. They can define their own methods and have nothing to do with xSGI Request classes but when you develop a new Perl web framework with PSGI/Plack then it's nice to have Plack::Request as a base class to add methods to it.

So what?

Currently, the name Plack::Request, and the fact that you can use it directly to write PSGI apps "suggests" that it is the way to write Plack apps. Actually, I see some examples and blog posts explaining "How to convert your Catalyst applications to Plack" using Plack::Request, which totally doesn't make sense.

Also, if a new framework uses Plack::Request and inherits from it, end users, sometimes as well as authors think that the framework depends on Plack (as a server) and can't run it on other PSGI server implementations such as mod_psgi, which is totally not true.

Plack and PSGI avoided having one implementation and interface (like Rack) and separated the interface and implementations like Python WSGI does. Considering that, does it make more sense to have different names for a) Request libraries to write Plack middleware and b) Request object you can extend in frameworks?

If so, I'd merge Plack::Request back to Plack core dist, strip some misused features such as param (!), and provide another Request class (that probably uses Plack::Request inside) and is fully extensible. We'll see...

2009.12.17

So I blogged why params() sucks but there're already applications and libraries that do this: Catalyst and CGI::Deurl for instance. Changing the behavior of these libraries or core framework would break the existing code, even worse, mostly silently (because ref $params->{foo} eq 'ARRAY' would silently return false).

Hash::MultiValue 0.03 is now shipped with from_mixed method, so you can easily create the MultiValue hash out of those objects:

2009.12.15

In a typical web application the most frequently occurring task is to get parameters from a request. Perl community and popular frameworks have been having two interfaces to this: param() and parameters(). And there's a few issues.

param()

Good old CGI.pm has a convenient param() method, which behaves differently based on a context:

This is quite nice, since your code says how you want values by explicitly stating the context (whether a scalar context or a list context). The only place it bites is that there are cases where you accidentally force a list context, such as when assigning it to a hash or pass to a method call:

This code quite doesn't work if there are multiple (and even number of ) name parameters, or even worse, injects some unintentional parameters to $vars that could be seriously dangerous if you inject that to an internal utilities or databases.

So, param() is quite nice but only if you are really careful for this list context gotcha.

parameters()

Catalyst has added parameters() to its Catalyst::Request object and it allows you to get values in an array ref if there are multiple.

This might look intuitive but wait a minute. The data structure gets different per user input rather than how you code it, and that sucks. This means you have to always check if the value is an array ref or not, since:

$query might become ARRAY(0xabcdef) if there are multiple query= parameters in the query. @names line might cause Can't use string as an ARRAY ref error if there's only one (or zero) name parameter. This causes horrible issues when using standard HTML elements like option or checkbox forms, or tools like jQuery's serialize().

Rack::Request

Let's see how other languages try to solve this problem. First, Rack::Request.

Rack::Request has params method which always returns a Hash object. They have their own rule for multiple values. If there are multiple values for the same key (like foo), the value is always the last value. By naming the key in a special way, like foo[], you can state that "This key might have multiple values", and req.params['foo'] would return Array instead of the String value.

Although it kind of hurts that you have to force this behavior in the low level library like Rack, but I think this is a good middle ground, since you can name your parameters in your templates and the request handler code to specify whether you want an Array or a String. This technique has been actually ported to Perl as modules like Catalyst::Plugin::Params::Nested

WebOb.py

WebOb is a Python paste library to handle WSGI request parameters and such and is used in Python frameworks such as Pylons. WebOb document explicitly talks about this may-or-may-not-be-multiple params problem very clearly:

Several parts of WebOb use a “multidict”; this is a dictionary where a key can have multiple values. The quintessential example is a query string like ?pref=red&pref=blue; the pref variable has two values: red and blue.

In a multidict, when you do request.GET['pref'] you’ll get back only 'blue' (the last value of pref). Sometimes returning a string, and sometimes returning a list, is the cause of frequent exceptions. If you want all the values back, use request.GET.getall('pref'). If you want to be sure there is one and only one value, use request.GET.getone('pref'), which will raise an exception if there is zero or more than one value for pref.

and I like it. It does the right thing if you handle as a normal hash but provides a method like getall to explicitly demand list instead of a string.

Hash::MultiValue

So, I was thinking of stealing this idea for our Plack::Request which currently inherits this sucky parameters() from HTTP::Engine and then Catalyst::Request, which most of the Plack gang agree is a bad idea.

Last night I was sketching the initial implementation of WebOb's MultiDict into Perl: Hash::MultiValue. It uses tie to behave like a normal hash with a single entry, but with an API to get multiple values if you want:

You can use the object just like a normal hash reference, and the value always returns the last element (if there are multiple). And you can also use the OO API call on the object to get multiple values, just like WebOb's MultiDict:

You should always use this get_all if you want multiple values. Being explicit is a good thing, right? There is also no list context gotcha like you see with CGI.pm style param().

Performance concern

There is a benchmark script attached because it used to do some tie/overload stuff which should definitely affect the performance.

UPDATE: this module does not use tie nor overload anymore, but uses inside-out object approach, thank to Michael Peters and Aristotle for the suggestion! The post content is updated appropriately.

With my quick test, the inside-out object based approarch, in a typical web request where there's only a few (~10) keys the performance is like 21,000 QPS (Hash::MultiValue) vs 32,000 QPS (normal hash). So, it is just like 80% of the overhead.

Whether this would become a critical overhead depends how fast your web application is: Plack standalone server runs like 1500 QPS and most framework gives an overhead to make it 500 QPS or less, so I think the overhead would be eventually < 1% of your web application, so maybe it doesn't really matter.

I'll probably spend some time soon on Plack-Request repository by creating a branch for this type of thing. Any input would be highly welcome ;)

2009.12.11

Some people still seems do not "get" Plack/PSGI, so here's the overview.

The important bit is that Plack is an implementation but is also a namespace for utilities, and things like Plack::Middleware and Plack::Request should be thought of more like a library. Plack::Server, ::Middleware, ::Request can be both used independently. And No, Plack is not a framework and as you see, the closest thing to a framework is Plack::Request which can be used as a request/response library to build a new framework.

Also, the picture might scare you like "holy cow that's a lot of layers!" but actually, No, PSGI interface is a Perl code reference that's executed inline, and a framework adapter is just a few lines of changes from their native CGI/mod_perl adapters. So it's usually an extra one or two method call stack and that could never be an overhead.

2009.12.08

During it, he commented "while I was on the plane from San Francisco to Japan I hacked on ...", and then "while I was on the plane here I hacked on ...", and I wondered what is the carbon footprint of miyagawa's modules?

Nicholas asked me this at the pub after LPW 2009, and it indeed was an interesting question!

I guess the fact that lots of my code is written on the plane reflects that many of my software is written (and is improved) using the CDD methodology - Conference Driven Development in the first place.

We submit a talk proposal to a conference, assuming we could make the software actually implemented and usable between now and a time when a conference happens, and it definitely forces us to actually write the software. Speaking about vaporware in a conference is embarrassing!

Also, by speaking about the same software multiple times in multiple conferences, we're forced to update the talk so as not to get bored by ourselves, and that helps us to improve the software, again.

2009.12.07

2009.12.06

LPW 2009 was fantastic. Meeting great people, mostly from #london.pm was so refreshing and there were lots of lots of quality talks, lovely lightning talks (especially by mst and pdcawley) and a too-much-beer-involved social at a pub after the event, which is awesome.

I did a replay of my now favorite PSGI/Plack intro talk with some updates that I made recently. The talk went really well, and got great feedbacks again.

And the biggest surprise was that Plack got "the module of 2009" prize at the wrap-up. Last year it went to Tim Bunce's awesome NYTProf, and the competitors this year were rafl's MooseX::Declare and ashb's TryCatch. Awesome company!

Thanks everyone who has contributed to PSGI and Plack in any ways. It's a great honor to receive the prize, and the actual gift was "Map of London" book, so another reason to come back soon :)