Imgur API part 2: Downloading a Gallery

This is the second installment of my Imgur API: ‘How to entire download Imgur Galleries’. Check out part 1 here in case you missed how to log into the API and upload an image!

I’m actually sort of cheating here, because we don’t actually need to use the API at all here, if we don’t want to. That is because we’re only going to deal with the galleries that are built from the images submitted to reddit. This means after you’re done with this tutorial, you’ll be able to just set the script to a given subreddits name, and grab all the images that have been submitted to /r/aww or /r/wallpapers.

I will be writing another tutorial soon for galleries and albums unrelated to reddit.com as well as grabbing gallery information, such as the title and other descriptive things like that, but that’s less related to the actual downloading of the gallery, which is what we’re interested in today!

Please note that I’m working with Python 2.7 on Windows 7 64-bit, so you might have to modify the code slightly to accommodate for your platform, or OS.

Anyways, hit the jump to get started!

Edit: reddit user: easttntoppedtree caught that it maxes out at 56 images, so you’ll have to add /page/PAGENUMBERHERE.json to the end of the url to get the next 56 images like so: http://imgur.com/r/scarlettjohansson/top/page/1.json’ while keeping in mind that 0 (zero) is a valid page number

First things first: the imports. We are going to import the usual networking suspects, requests, json, as well as the ever useful pprint module. New to us this time is the datetime module, which is a handy way to handle time related things in Python. We’ll just be using it to grab todays date and time. Finally, there’s the os module, which Python uses to interact with the proper OS for your computer. You can do helpful things like detect whether or not a folder exists and create one, or check which path Python is currently running in. It’s a very powerful module; so again, we’ll only be briefly using it to create a folder to hold our images in.

Here, we use the requests and json module to make a GET request to the url customized to fit our SUBREDDIT value that set above. Once we get the response from the site, we transform the raw JSON object into a Python Dictionary, which is something we can manipulate a lot more effectively within Python. Make sure that you are using the text attribute of the response, instead of the content or raw response data.

Edit: May 29th 2012: easttntoppedtreecaught that it maxes out at 56 images, so you’ll have to add /page/PAGENUMBERHERE.json to the end of the url to get the next 56 images like so: http://imgur.com/r/scarlettjohansson/top/page/1.json’ while keeping in mind that 0 (zero) is a valid page number

##Download and load the JSON information for the Gallery
#get json object from imgur gallery. can be appended with /month or /week for
# more recent entries
r = requests.get(r'http://imgur.com/r/{sr}/top.json'.format(sr=SUBREDDIT)
#creates a python dict from the JSON object
j = json.loads(r.text)

This is just for your own uses, to see exactly what the response was. You can use this to determine whether imgur is over-capacity or if the URL was set incorrectly. For now, I’ve commented it out, since it should be working fine for now.

#prints the dict, if necessary. Used for debug mainly
#pprint(j)

Now, we extract the List of images in the JSON dict we had just created. You can check out the layout of the dictionary we created from the JSON by uncommenting the `pprint` line above

get the list of images from j['gallery']
image_list = j['gallery']

Some more flavour text, so we can confirm the amount of images in the gallery. It checks the amount of objects in the list, using the len builtin Python function

#print the number of images found
print len(image_list), 'images found in the gallery'

More debugging options, here for you to examine the content of the first image found in the list we had just created, which is found at index 0, because, as you know, lists begin at index 0 instead of 1.

#debugging, examine the first image in the gallery, confirm no errors
pprint(image_list[0])

Now, we want to create a folder in which we can fit all the images we are going to be downloading in a minute. I like putting them in timestamped folders, but you can easily change it to be called the name of the subreddit, or anything else.

Here, we use the `datetime` module to fetch the current time, in a format specific to `datetime`

#get the time object for today
folder = datetime.datetime.today()

So that means we need to turn it into a printable string we can use to name our folder, so we run the str builtin function on it, which does exactly what we want.

#turn it into a printable string
string_folder = str(folder)

Then, since some characters cannot be used as a folder name, we need to remove them. We use the string’s function, called replace to remove the colon character, and replace it with a folder-friendly character, the period.

Now, we use the mkdir function from the `os` module to create a folder using a legal string we just created. Remember that unless you specify otherwise, the folder will be created in the same location the script is running.

#create the folder using the name legal_folder
os.mkdir(str(legal_folder))

Next, we need to extract the name and the type of image each image is. So we create an empty list, in which we’ll put a 2-item tuple, which will contain the name and extension of the file, which we’ll use for downloading and saving the image.

#list of pairs containing the image name and file extension
image_pairs = []

At each index of the list of images we have created, we’ll find a dict filled will miscellaneous information about the image, such as its size, and how many times it was downloaded. All we’re interesting in though, is the hash and the ext keys and the value. So for every image dictionary in the list, we take the hash and ext keys and take associate values, and append them both to the newest list for later.

Next, we need to download the images from the website. We do that by substituting the name and ext of the image into the URL template below; but we don’t want to surpass our pre-set download limit, in case there’s a bandwidth limit, you don’t want to hammer Imgur’s server, so we need to keep track of the number of images we grab.

So first, we set a temporary variable to keep track of the number of images we’ve grabbed.

#current image number, for looping limits
current = 0

Then we start a loop that stops when current is equal to or greater than the DL_LIMIT we set at the beginning of the file

#run download loop, until DL_LIMIT is reached
for name, ext in image_pairs:
#so long as we haven't hit the download limit:
if current < DL_LIMIT:

Then, we fill the URL template with the name and extension of the image on the site.

#this is the image URL location
url = r'http://imgur.com/{name}{ext}'.format(name=name, ext=ext)
#print the image we are currently downloading
print 'Current image being downloaded:', url

Next, we have to download the actual image, instead of the JSON that is referencing it. We do that by once again using the requests module to create a GET request to the URL we’ve filled in and then saving the response.

Then we create a file object, at the path location, and set the file object to ‘write binary’ instead of the default ‘read’ because we need to make sure we are writing, for one thing, but also in binary mode with binary data to a file, rather than strings (think 0s and 1s instead of ‘abcs’). This is the same reason we use the response.content attribute, instead of response.text

I get an error at:
image_list = j[‘gallery’]
‘gallery’ apprently doesn’t form part of the j data structure which is made up of: status, data, success.
Is there an alternative way to get the number of images in the gallery?

It looks like Imgur might’ve changed their json layout with their API3 changes, so I’m not exactly sure right off the bat. It looks like you’d have to register an API and make a call to the ‘http://api.imgur.com/models/album’ URL and parse the `images_count` response, but that’s specifically for Albums, and not Subreddit Galleries. Actually no, here http://api.imgur.com/endpoints/gallery it looks like if you made the API request to the subreddit gallery, it’d just act the same way as it would for Albums.

So make a GET request to https://api.imgur.com/3/gallery/r/scarlettjohannson and then look for the ‘image_count’ attribute. I’m not sure if that’s the total images or just the images on the page though. Please let me know if you work it out! Sorry I couldn’t help more.

Hello! I am trying to grab images form a subreddit using your script. I am not experienced in Python, when I run the script it opens a command window but nothing happens and after a second or two it closes. I installed pip and requests. Any advice? Thanks