Tuesday, 13 May 2014

We still have to use the software, it's now installed on every computer in the company, and it's still intermittently failing for weeks at a time. the script has changed a bit, so it's time for a repost.

Because the install base is bigger, it's easier to run it over the whole affected IP range than to work from a computer list. This also prevents the occasional hiccup where a computer got missed because the DNS cache was out of date.

It's a lot more verbose. It's now quick to tell at a glance where the script is up to, which is handy on the odd occasion it gets stuck.

After each loop it gives you some pretty yet fairly useless stats, because I like pretty things.

What would be required to make this a real fix:

Get rid of the main loop, and get it to run once through only. Trigger that ever 10 minutes or so from Task Manager.

Set it to watch the affected program directory for new folders. If a new folder gets pulled down, it'll assume it's a fixed one.

Have it identify the directory it's copying dynamically, so there's now zero reason to actually edit the script. Just copy the working files to the right place, wait ten minutes, and it'll start rolling out.

Thursday, 30 January 2014

Vidd.me launched recently, as a simple and effective way to share videos online. You upload your video file or gif, it converts it into mp4, and you get a link to the nice html 5 page it's displayed on. It seems to work rather nicely and the team are adding new features all the time without adding new limits.

Since it loads the raw mp4 file into the browser, getting the file is easy. Open up the source to any page and you have something like this:

And with our handy friend wget, you can download the full video from the commandline without any special tools at all.

So far it's business as usual. You could have used anything to download it, (including your browser. Things get a little more interesting when you look at the source of their 'new' or 'top videos' page:

There's three things to note here:

Each preview on the page has '-clip' added to the filename, to distinguish it from the main mp4 file

Every file seems to be stored on the same server using the same directory structure, d1wst0behutosd.cloudfront.net/videos/

The files appear to be named sequentially, with gaps for deleted or private files.

This presents a very simple way to find content. At the time I'm writing this they've got around 4200 videos, so let's pull down all the previews on their site in just a few lines:

#!/bin/bash
for i in {1..4200}
do
wget "https://d1wst0behutosd.cloudfront.net/videos/"$i"-clip.mp4"
done
exit

A little while later.

If you feel like downloading everything on the site - full length - you just have to modify the script above ever-so-slightly, like so:

#!/bin/bash
for i in {1..4200}
do
wget "https://d1wst0behutosd.cloudfront.net/videos/"$i".mp4"
done
exit

To take it further, It'd
be pretty simple to set up a script that updates your mirror of the site: A cron job that grabs the 'latest videos'
page, parses it for the highest video number, and then goes from the
last downloaded video until that point (or just count up from the last known video until you hit too many failures in a row). As a side-effect of the one-way sync, you'd have a
copy of any videos that were subsequently removed.

I have to point out that the last paragraph could violate the Terms of service, which has a few conditions against scraping and DDOSing you might fall foul of. While they don't seem to have yet, it's also likely they'll throttle heavy users

Instead of that, I'm more interested in picking and choosing based on what looks interesting from those previews I grabbed. Let's create a quick script that lets me specify an arbitrary number of videos (based on their number) and download them:

It takes the video numbers as command-line arguments, so now I can do this:

No need for browser extensions or custom software, you can now just grab any video you like with what you have installed.

As long as the privacy controls in place are solid and they put some throttling in place to stop people ripping the whole site every hour, there's nothing wrong with how they've set things up. The site is designed to be as open and accessible as possible, and right now they're doing that from the ground up.