Month: January 2009

How to download Youtube videos has been done by a number of bloggers in a number of languages. I just wanted to try it out myself in Erlang, but I had no idea where to start.

Now that I’ve decided to start, let me grab an Eddie Vedder song from Youtube:

http://www.youtube.com/watch?v=gct6BB6ijcw

And lets sniff the HTTP traffic using Wireshark to see what actually happens when we watch a Youtube video. The browser made a conection to Youtube and before it started streaming the video, the input URL got redirected to a different Youtube URL (Depending on whether Youtube is caching the video or not, it could in turn probably be redirected to a different Cache server or a different IP):

The param “video_id” is the same as the param “v” in the original URL. So we only need to find the value for the param “t”.
Lets look at the HTML source for

http://www.youtube.com/watch?v=gct6BB6ijcw

and see if it contains the value for the parameter “t”. Luckily, grepping for “&t=”, I found this in the source:

&t=OEgsToPDskKrsk1Xwku653CJbAXXrdJb

The value of “t” in the HTML source and the value of “t” in the redirect URL is different, but I found that both values of “t” seemed to work when appended to the URL. Oh well, I tried it again just to make sure and realized that the value of the parameter “t” changes for every request, but all values seem to work(Probably it is timestamp dependent!).

So we have a plan now:
1. Get a Youtube video URL and make a HTTP request to it.
2. Get the body of the reponse and find the value of “t” using regex pattern matching.
3. Generate the proper redirect URL using the two parameters “video_id” and “t”.
4. Make a http request to the new URL and stream the bytes and write to a file with “.flv” extension.

Go to your current directory and use any Flv Viewer to see if the downloaded file is a working video or convert it to format of your choice and watch it offline.

I would love to give an Erlang twist to it by spawning a few concurrent processes to download videos, but this is not quite a good example to do it from by localbox – too much of Disk IO, Network IO and slow Internet connection.