Making Websites Behave using Perl - The yjobs-proxy Story

Here is another cool use for Perl: pre-processing the HTML/JS/CSS markup of
poorly-written sites before it reaches the web browser. In this post, I'll
tell the story of how I ended up writing the
yjobs-proxy
markup-transforming proxy using CPAN's
HTTP-Proxy to make
www.yjobs.co.il work with Firefox on my Linux system.

It all started when I was job-hunting, and was dismayed to discover
that there were much fewer Info-Tech job ads in the newspaper's "Wanted Ads"
section than there used to be. The section proudly announced that now it has
an Internet counterpart -
www.yjobs.co.il. But much to my
disappointment, it didn't work in my Linux-based, open-source browsers.

I almost immediately thought of writing a
Greasemonkey script to whip the
JavaScript code there into a shape where it can work with Firefox. Eventually,
I started writing it, and looked for a way to inject new declarations of
JavaScript functions into the page, to replace the existing and broken
ones. I
found a way to do that, but it turned out to have some limitations due
to the architecture of Greasemonkey and the way it interacts with the page.

After thinking about it for a moment, I realised I could achieve the same
thing by transforming the code that Firefox receives from the site into
a more agreeable version. So I thought of a transforming proxy. Someone here
on use.perl.org mentioned HTTP::Proxy in one of his posts, so I went to check
it out and see if it can solve my problems.

Meanwhile, I was distracted and delayed a bit by
investigating this
X Server bug. But then I resumed to work on the proxy. HTTP-Proxy turned
out to be a great way to implement what I had in mind, but I still ran into
a few problems. (Which weren't HTTP-Proxy's fault.).

The first one from what I recall was that it refused to filter JavaScript code.
As it turned out yjobs sent the "Content-Type:" of the JavaScript code either
as "application/x-javascript" or an undefined one, while I used
"text/javascript". I ended up filtering them by the.js extension in
the path, and by specifying a mime filter of "undef".

Then I ran into a problem where a variable called "Data" was assigned
to, but not used anywhere else. As it turned out, my logging proxy, which I
used to dump all the traffic, did not log the particular script where it was
made use of. Maybe Firefox cached it. After that, I found out where it was
used and used the Venkman JavaScript debugger to the problem I had getting
it displayed on the page. It was fixed using a JavaScript transformation
specific to that particular script.

Another problem I encountered was an original function was called despite
the fact I overrided it in the bottom. As it turned out, this was caused
because it was invoked before the JS interpreter reached the
definition at the end. Like this code:

This was resolved by transforming the JS code in the original function.

Eventually, I got it working enough. Then I cleaned up the proxy code, and
released
it for the world's consumption.

My future plans for this proxy, is to investigate a way to implement it as
a Firefox extension that will be transform the markup from within Firefox.

A fellow Perl programmer I talked with on AIM that I pointed to the download
page, said that "that's nucking futs, man" and then that "oh, it's cool. I just
mean, that's pretty crazy. A proxy to make a site work... crazy. and
awesome.".:-)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Without JavaScript enabled, you might want to
use the classic discussion system instead. If you login, you can remember this preference.