SkyDrive: Why aren’t thou nicer to me? or “How to crawl pages and download all images using C#”

Ok, SkyDrive is still a baby, and personally I’ve used other services providing file space in the cloud and enjoyed them a little better. But this post isn’t about if SkyDrive is good or bad, it’s just about a missing feature that is very painful. Someone wanted to share some fotos, uploaded them to SkyDrive and all I wanted was to download them all to my PC. Tough look, you can click on each and every image to get to the preview page, where you click on the preview picture to then finally get at the actual picture. Multiply that by about 100. I have better things to do than waste my time on that.

So a Dev does what he does best, fires up Visual Studio 2008 and hacks away (did I just say I had something better to do – well I lied partially, but before I go off to do that, there is always time for some good ol’ C#).

I’ve posted it here not as a finished utility (there are no binaries) but as a small sample. Using WebClients, RegEx and some other stuff it downloads the list page of the SkyDrive folder, fetches the preview page and then downloads the actual image to a folder on the hard disk. Not really rocket science and of course there are a few quirks (no real error handling for example), but it’s just a sample. Feel free to extend as you wish, don’t blame me if it starts downloading Gigabytes of files overnight, because you accidentally crawled a HoneyPot. (And yes, it only downloads jpgs at the moment. I didn’t need any other types.)

May those SkyDrive bytes be with you…

/**********************************************************************************
*
* Example Application for crawling web pages and downloading images.
*
* This code works if you pass in a SkyDrive Folder Url (http://&#8230;. /browse.aspx/…)
* and will download any jpg images it finds in there.
*
* Permission to use, copy, modify, distribute and sell this software and its
* documentation for any purpose is hereby granted without fee.
* I make no representations about the suitability of this software for any purpose.
* It is provided “as is” without express or implied warranty.
*
* Alex Duggleby – 24.05.08 – V0.9 – http://alexduggleby.com
*
**********************************************************************************/
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.Text.RegularExpressions;
using System.IO;
using System.Web;
using System.ComponentModel;

using (WebClient _wc = new WebClient())
{
// This is the index with all the images
string _pageContents = _wc.DownloadString(_uriStart);

// Each image has a preview page, so we get the url to that, before we get the url to the actual image
foreach (Match _matchUrlToImagePage
in _regexUrl.Matches(_pageContents))
{
Uri _uriToImagePage =
new Uri(_uriStart, HttpUtility.HtmlDecode(_matchUrlToImagePage.Groups[“url”].Value));

// Find the image we want to download… There should be
// only one link with title=”Open” in it.
foreach (Match _matchImage in _regexUrlOpen.Matches(_pageContents))
{
Uri _uriToImage = new Uri(_matchImage.Groups[“url”].Value);

// Create a seperate web client for each image (uses async, and you can’t
// issue two downloads at the same time for the same client). Of course
// here we should be using some kind of pooling but this is the quickest
// way to do it.
using (WebClient _wcInner = new WebClient())
{
_wcInnerCount++;
_wcInner.DownloadFileAsync(uriToImage, Path.Combine(_diDownloadTo.ToString(), _localFilename));
_wcInner.DownloadFileCompleted += new AsyncCompletedEventHandler(_wcInner_DownloadFileCompleted);
}
}
}
}

// Is fired when a download complete. We output status and check if we are finished!
private static void _wcInner_DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
// Increase the completed counter
_wcInnerCompleted++;

// Ok, we could do some more extensive checking, this could trigger
// even if there are still items to download… but hey, it’s just a
// quick utility!
if (_wcInnerCompleted == _wcInnerCount)
{
Console.WriteLine(“{0}{1}{2}”, Environment.NewLine, “Finished all files!”, Environment.NewLine);
Console.ReadLine();
}
else
{
Console.WriteLine(“File {0} of {1} completed!”, _wcInnerCompleted, _wcInnerCount);
}
}
}
}

6 thoughts on “SkyDrive: Why aren’t thou nicer to me? or “How to crawl pages and download all images using C#””

Nice! I made a huge mistake of uploading all my photos into skydrive, I mean, why not? 5 gigs of storage. I thought also, once they are up there, I can organize them very easily…nope… So long story short, I got to find a way to get these image files off of skydrive…bleh..