I have been working on ways to automate some of my social media and blogging activity over the last little while and one of the super useful tools I’ve been using is IFTTT (If This Then That).

However, browsing the recipes on IFTTT I have realized how redonkulously easy it is to fully automate the theft of photography from social media.

It’s so easy its crazy – which leads me to my second thought:

How can it possibly be this easy to automate the theft of copyrighted creative content from social media without the permission or knowledge of the content’s owners!?

Or, in other words – why don’t social media networks give a flying fudge-nugget about copyright???

How To Scrape Content From Social Media

Whenever someone posts how to do something bad in an article about how bad it is to do it, the Internet immediately jumps all over the author for posting details on how its done…which is why we have to post something like this:

If you really want to steal content from social media, you’ll figure out how to do it – its not exactly rocket science, so what I’m posting here isn’t any big secret that the average teenage wannabe Instastar couldn’t figure out for themselves. In other words, to explain the extent of the problem, I have to explain a little about how it’s done.

Right – so if you wanted to steal photos from social media – here are a few IFTTT recipes you might be interested in…

Instagram

This little beauty allows you to automatically download, and save every photo you like on Instagram – from anyone’s account.

The account owners are not notified or asked permission, but their photos are downloaded by you, an unrelated third party, as tidy little 640×640 jpgs. All metadata, including Creator and Copyright is completely stripped out.

I love 500px. The photography community loves 500px. Somehow I thought that 500px would have more control over their user’s content – especially since they launched the 500px Marketplace which actually charges money for downloading user content.

(The 500px Marketplace charges $35 for a 1500×1000 px web resolution download and up to $300 for a full resolution copy – very reasonable rates that put a legitimate value on their user’s content and boosts the value of all photography – kudos to 500px for that.)

But no – this recipe downloads 500px photos directly to Dropbox without the account owner’s knowledge or consent.

What’s even worse (ya it gets worse), is that you can specify a category and a popularity level and then download every single new photo that hits that criteria – no manual likes required at all!

The 500px photos are resized to 900 x 600 px and downloaded as jpegs (small, but large enough to use for blogs, and re-post). 500px adds a 500px watermark on each image, along with a watermark of the 500px photo link (easily cropped out with a batch Photoshop action).

500px does retain the original metadata including Creator, Copyright and Capture Date and the default title nomenclature includes the original image name and the user’s account name (although again this can be changed in the recipe).

I’ve had this bad boy running for a few hours and I’ve scraped about fifteen brand new Popular Landscapes without lifting a finger.

And while we’re at it, why not just cut out the middleman altogether?

This recipe will scan Instagram for a certain tag (or tags) that you specify and then automatically posts them right to your Facebook page!

I’ve tried this one out, and it doesn’t post a link or a URL (which would be totally legit) – it downloads the actual 640 x 640 Instagram jpg (with the metadata completely stripped out of course) and uploads that image to the Facebook page and saves the actual image file to your Facebook photo album.

Now, obviously if you actually owned the images with the tags you’re filtering – this is really useful.

But, if you want to build a massive Facebook album featuring #dailycatphotos from an automated collection of photos you don’t own, without the actual owner’s knowledge of permission, this seems like a pretty fast way to do it.

So, speaking of Facebook – this is where things get a little more interesting.

As far as I can tell, you can only grab photos that you are personally tagged in, or that you have personally posted from your own Facebook account.

It is important to note that this is different than all of the other social networks we have looked at so far. All of the others allow you to download content from other user’s accounts – content that you clearly don’t own and are not associated with in any way. At least with Facebook – you can only download content that you have posted to your own personal account, or that you are personally tagged in.

This Is Just The Tip of the Theft Iceberg!

By now it should be pretty obvious just how easy it is to automate the theft of copyrighted creative content from social media networks.

These are just a few examples of what’s possible, and I am only looking at a legitimate tool – IFTTT. I am sure that there are millions of illegitimate ways to do everything the above IFTTT recipes do faster, with a broader scale and with far less user input.

I have also only concentrated on recipes that allow you to download images and other creative content. It is just as easy to automate the opposite – posting content gathered from other users directly to your own various social media networks.

Content Theft Is Baked Right Into The API

If you’ve been following along, you’re probably wondering what does this have to do with the social media networks themselves?

It would be easy to blame IFTTT for publishing these tools in the first place.

But that would just be shooting the messenger. And, if you actually owned the content you were downloading, you can probably imagine how useful some of these tools might be.

You could also blame unscrupulous users for taking advantage of tools like this to create their own social media empires. But since social media is still the wild west and the threat of repercussion is minimal (especially for accounts held in parts of the world where copyright law is just a nice idea) – its pretty hard to expect that nobody is going to apply these tools for their own personal gain.

I put the blame squarely on the social media networks themselves – because it seems that the tools that IFTTT uses to identify and download your content are baked right into the various social media network’s APIs (I would love to be proven wrong on this – if you’re more of a tech expert than me, please let me know if the social media networks are innocent).

To understand, you need to know a little about how IFTTT works.

To build a IFTTT recipe, you have to set a trigger (if this happens) and then an action (then do that).

Lets use our whipping boy 500px and Dropbox as an example.

500px and Dropbox are both IFTTT “Channels” – meaning that IFTTT is supported by their individual APIs.

So, to build a recipe, we select 500px as our trigger Channel.

Next, we have to select the actual trigger from a list of supported triggers. Triggers on this list are directly supported by the social network’s API. So functionality to trigger an action for every “New Popular Photo” or every “New Editor’s Choice Photo” or every “New Photo from Search” are enabled by 500px.

Then we select an Action Channel. In this case Dropbox.

Next, we select an Action. The actions on this list are directly supported by the Action Channel’s API – so in this case it is Dropbox that is allowing us to download a file from a URL.

Then we select the “Ingredients” that will be performed by the “Action”. It is very important to note here that the “Ingredients” that are available depend on the specific social network’s API – so it is 500px’s API that is allowing us access to the source URL, which enables the file to be downloaded.

And there you go – done – you can now collect every single 500px Landscape photo that makes the Popular status from this day forward.

From this example, I hope you can see how it is actually the social media networks that are enabling automated downloading and uploading – making it so easy to automatically scrape content from anyone’s network and then re-upload it to your own networks.

What I find really interesting here is the format that the images are downloaded from 500px to Dropbox.

Again, it is the 500px API that is controlling this process, because 500px is not allowing a full size download (remember the images are resized to 600×900 px by 500px’s API – although I am suspicious that there is probably a pretty simple hack available to grab the full resolution file) and they are adding a 500px watermark.

500px could have limited the file size to 100x100px with a massive watermark – but they didn’t – they made a conscious decision to allow anyone to automate the collection of perfectly usable images.

Which of course leads to the conclusion that 500px and the other social media networks actually want nefarious users to be able to collect and share content that they don’t own.

I don’t know, maybe I’m picking on 500px a little too much here (I still love you 500px). At least 500px adds a watermark, downsizes the photos and retains the metadata – so maybe from their perspective they are still retaining your copyright rights even is their API does enable automated content collection.

I personally cannot think of a single legitimate reason why someone else might want to automate the collection of images they don’t own – without the knowledge or consent of the actual owners – but maybe I am missing something.

If someone from 500px, Instagram, Twitter, Facebook et al. are reading this – maybe there is a legitimate explanation as to why they would intentionally bake this functionality into their APIs (we would love to publish a response piece).

How Social Media Networks Could Protect Your Content (If They Wanted To)

It seems pretty obvious doesn’t it?

All they need to do is restrict actions to accounts administrated by the user (with an account login or API key controls).

With this simple control, you could automate – download, upload, share etc. to and from your own various social networks as much as you want – but you wouldn’t be able to re-appropriate other user’s content.

Facebook already does this.

The Instagram API is notoriously restrictive for controlling uploads – you can only upload to Instagram manually from a mobile device – a pain in the ass for businesses who want to automate their networks, but it keeps the bots out. However, the Instagram API is one of the worst offenders for allowing automated third party downloads.

The point is – your content could be much more secure – with minimal effort – in a way that would have zero net effect on the legitimate use of these tools.

What Do You Think

Are social media networks inherently responsible for the rampant theft and use of copyrighted creative content?

Do you think they actually encourage this behavior by intentionally allowing their APIs to be so easily misused?

Are low resolution files OK to take and re-post – or is this still theft? Would if be OK if you had to at least ask permission?

Is it up to the end users to police their own behavior?

Is resistance futile?

Why do you think social media networks would make it so easy to siphon off creative content without the knowledge or consent of their users?

I’d have to look a bit more into this, but I think the really basic explanation is something along these lines: it’d be hard and counter productive for social networks to try to block access. Why? Because gathering photos from sources requires extremely basic API functions. Something that will give you an web address to a photo, given a set of parameters.

Problem is, this very basic function is required for all sorts of things. A smartphone app? Needs it. A service to backup your photos? Needs it. An alternative interface (web or app)? Needs it. The social network’s own apps, like statistics analyzer, automated content verifier, among several other maintenance and development tools? Needs it. A button to share it to another social network? Needs it. Almost everything you can imagine an API does, it’ll need it.

I think it’s a bit hard to picture and explain, but the bottom is that if you don’t allow that sort of access, then you just don’t have an API useful for almost anything. It’s just better not to have it.

Everything that anyone can already publicly access needs to also be accessible via the API, and one of the functions of having an API is to facilitate that access. So if photos are there on a publicly shared space, the API needs to be able to access it. From there to compiling a list of photos given set parameters and downloading them, it’s a few lines of code.

Now, as for why I said it’s counter productive: believe it or not, but it’s extremely easy to rip off entire categories, photos, content, albums, tags, etc etc etc – as long as those are publicly accessible. You don’t need illegal shady tools or whatnot, there are plenty of very legal plugins, software and whatever that do it. This isn’t really a case of security by obscurity: if someone really wants to rip-off everything from a publicly available category or person, they can.

And as awful as this may sound, it’s also not illegal to make a tool that does it. You don’t need an API for that.

I’ll speak from experience. Over 10 years ago, when Chrome didn’t even exist and Facebook was still a website to check on chicks and guys inside Harvard, there was already at least one plugin that I know about on Firefox that could take a page with thumbnails – which linked to the full size image – and retrieve everything there.
This might sound a bit scandalous or absurd for some, but it really isn’t. It’s really basic code that everyone should be able to do after taking any basic web language course.

The very basic way web languages work are really not built with privacy in consideration. The language isn’t encrypted, and you can see the code of the pages you are accessing right from the browser – with good reasons, nothing nefarious at all. If it was any more complex the Internet would be something else entirely.

But this required simplicity is exactly what makes an exploitation like that possible. The benefits far outweight the problems though.

I’m not shure if my explanation will make sense.

About the comparisons, I think you may be looking from the wrong perspective JP. The only reason why Facebook API doesn’t allow you to rip off an entire photo collection from strangers is because Facebook isn’t a photo based social network. Albums are way more limited, you cannot search using tags, you don’t have specific public curated photo albums collections identified by their content, it’s not the main focus, so it simply lacks the markers to do it. That markers that it does have though, IfTTT can use.

You have to see it from an application perspective. APIs won’t give anyone access to photos or content marked as private without a login and password (good APIs at least). But for content that is already public? It needs to. The entire purpose of it is that.

Doug Sundseth

In order to display a photo on my computer, my computer must download a copy of that photo. (This is the way computers work.) The only issue then is how persistent that copy is. And no social network has much control over that.

Lets for a moment accept that there is no way for social media networks to protect creative content from automated scraping (if not block outright, there are a number of controls that would be relatively simple to make it a lot harder to automate – ebooks come to mind as an example of electronic creative content that is well controlled) – there is still no excuse for at least not adding tracking information – embedded micro watermark, meta, source tags etc – so at least after your content is stolen it would make it relatively easy to track it down and invoice for use – instead we have to rely on Google Image search.

I don’t buy the argument that content theft is an inherent part of the internet or sharing creative work online – it is a simple lack of will to address the problem at the source.

hansmast

I don’t want social media networks modifying my stuff without my consent before publishing it! If you want, you can put copyright info in the EXIF before uploading.

The 500px rss feed posts the images without the watermark, you actually can make a script that parses the feeds (that you can personalize in any way you want) and download them directly to your computer, or to some storage service.

Rex Deaver

As other folks have described in detail, this is baked into the nature of Internet. It was designed, remember, by DARPA to be a communication system that is virtually impossible to block, regulate, or control in any way.

hansmast

Welcome to the internet. It seems you just discovered how it works. DRM can never beat those looking to circumvent. It’s a structural thing that has to do with needing to give the keys to the viewer. Thus this silly idea of trying to “lock things down” has been discarded by observant folks a long time ago. It’s like gun control: if you lock stuff down, you only block the good guys from sharing your stuff, the bad guys can still easily, easily get around it. (Amazon figured this out with its MP3s, for one prominent example.) You either need to embrace the internet full-on and allow everything to be as easily sharable as possible and harness the virility of the internet to get your work out there, realizing that there will be a bit of collateral damage (which is unlikely to actually even be substantively damaging with your actual clients), or you can stick with password-protected portfolios for your little commercial clients and never show your work to anyone else except in offline formats. Good luck with that approach.