Blog posts tagged "api"

We’ve been doing a ton of hacking recently on our Taste Test experiment, which in turn reminds me a lot of an ill-fated Flickr project, “Top Explorers”. (to anyone who still has Flickr SVN access the code should still be there) Computers have made dumb consensus and simple popularity so trivially simple to implement, that projects to explicit hilight individual voice really intrigue me.

Anyway I got inspired last weekend to spend a little while coding up a primitive Top Explorers implementation that would run over the API and on top of Pig. And as part of that I resorted to one of my favorite hacks for visualizing Flickr API results, the “standard photo response as slideshow” hack.

A friend (from Google) recently trolled me, asking, “What’s up with the data lock-in at Flickr?”.

Got me thinking about standards. I wrote back a rant to a mailing list of fellow senior hacker, and coders types. Below I’ve included that rant, largely verbatim. I’d been meaning to turn it into a more reasoned blog post, maybe something suitable for posting on a more official outlet, but life is short, and Rod’s post about Quora reminded me to get on it.

As software engineers, as social software engineers, it’s important to have standards. You can debate the how much of what we do can be called engineering, even charitably, but the code we write determines the rules that govern the spaces more and more people spend time in, and while “First, do no harm” might be reaching, a few standards that you should be embarrassed to not meet seem appropriate.

One of those is around data access, data ownership, and sharecropping. This is something Flickr takes very seriously.

The Minimum

With Flickr you can get out, via the API, every single piece of information you put into the system.

Every photo, in every size, plus the completely untouched original. (which we store for you indefinitely, whether or not you pay us) Every tag, every comment, every note, every people tag, every fave. Also your stats, view counts, and referers.

Not the most recent N, not a subset of the data. All of it.

It’s your data, and you’ve granted us a limited license to use it.

Additionally we provide a moderately competently built API that allows you to access your data at rates roughly 500x faster then the rate that will get you banned from Twitter.

Asking people to accept anything else is sharecropping. It’s a bad deal. Flickr helped pioneer “Web 2.0″, and personal data ownership is a key piece of that vision. Just because the wider public hasn’t caught on yet to all the nuances around data access, data privacy, data ownership, and data fidelity, doesn’t mean you shouldn’t be embarrassed to be failing to deliver a quality product.

The ability to get out the data you put in is the bare minimum. All of it, at high fidelity, in a reasonable amount of time.

The bare minimum that you should be building, bare minimum that you should be using, and absolutely the bare minimum you should be looking for in tools you allow and encourage people who aren’t builders to use.

A Reasonable Exchange of Value

Flickr actually goes a bit farther, not only can you get your data out, but it gets enriched as it passes through the system.

If you use the geotagging feature, you don’t just get the lat/long out you put in, but your photo comes back with a whole hierarchy of geographic descriptors, that are
pointers into a publicly available gazetteer (Y! GeoPlanet). It would be good if there were pointers into other publicly available gazetteers (if for example Google ever released one) but there isn’t a good concordance service yet (but it’s being worked on)

You get structured access to all the metadata that people have added to your photos, with proper attribution available. (of course there is a working privacy model, so your “friends” aren’t getting data they aren’t supposed to, like your friend requests, and chat logs)

This isn’t the exhaustive list, just a few of the things Flickr does to respect, and collaborate with the people who share their time and data with us.

I’d certainly love to get a fraction of this data back from other services I use. Imagine getting access to all the data Google has about you, and everything they’ve learned partially based on observing you. I’ve gotten used to being disappointed by most of my fellow practitioners, but I still dream about using good tools that treat me with respect and want to collaborate.

(and I’ll state the obvious this is my personal blog, nothing I post here should be taken as official Flickr or Yahoo communication or policy, unless otherwise noted, that isn’t what they pay me to do.)

As a bonus, you get a valid Flickr auth_token for every signed in user. This makes writing Flickr API apps about the simplest thing ever.

Case in point, I wrote the largely mis-named photosthatmatter app last night in slightly less then 20 minutes, while waiting for dinner to simmer. Shows the most interesting photo from each for your contact in a given time period. Great for catching up on things you missed as they flowed by the first time.

Get info about an article/Search by URL

Positioned as a search API, it also doubles as a “getInfo”-style API, as article URL is one of the searchable fields.

?query=url:$article_url

Just make sure to remove the various query string bits that the Times appends, as these aren’t indexed. Should make a “find the history of this topic being discussed” Greasemonkey script a snap.

Expert’s attention information

One of my less comprehensible requests to the NYTimes developer team at OSCON last year was to make sure their APIs exposed the “attention information of [their] editors.” Age of amateur, citizen journalism, and radical decentralization are all awesome, but the NYTimes’ editors job is to think about what is important and interesting full time; and that’s information worth mining.

And they did!

The page_facet, and nytd_section_facet both allow you to gauge some degree of relative weight given to a story. (section_page_facet seems like it ought to do the same thing, but I couldn’t get it to work)

?query=flickr nytd_section_facet:[Front Page]

Gives you articles mentioning “flickr” featured on the NYTimes front page. (of which it only finds 3, alas)

API Design

Good stuff:

Clean hackable URLs, you can play with it in your browser and see what you’re going to get.

The getList + extras (called fields in the NYTimes API) is the house wisdom at Flickr, and I’m glad to see it elsewhere

The parsed tokens block is neat, and I can see it being incredibly useful for working with such a large, varied corpus

The sure amount of searchable/indexable metadata and the granularity is really unprecedented, great to see them go out with such a rich, “here’s the data do something great” approach.

I’m a big believer in Norvig’s “Code is liability” maxim. Which is how I justify my ugly, but functional Flickr API implementation, in 40 lines of PHP (not the most expressive of languages), which I wrote in about 15 minutes one evening, and I now use for all of my Flickr side projects. And all apropos of digging through other folks Flickr API impls, trying to get them working on GAE. Thankfully blech is already there.

Just putting a note here for the next time I’m working with the Yahoo! GeoPlanet APIs.

The conudrum: a HTTP GET on a given resource (http://where.yahooapis.com/v1/place/23511846?appid=$appid) works in the browser, and works with wget from the command line, but fails from within PHP with a 406 Not Acceptable.

The solution, append format=XML to the resource URL, because the service is blowing out its brains on a missing Accepts header.

WeeWar broke in a wave across the office this afternoon. (thankfully late afternoon, or I might have gotten nothing done today). Its a Web-based turn based strategy game, thats very well done. Sort of a “Flickr for Risk”, with a nice value add pro account ($24.95/year), and APIs, social networking features, and a chatty tone.

XMPP

But I’ve never run into an application that needed an XMPP interface more.

The most fundamental missing functionality is a convenient, light weight way of getting notified that your turn has rolled around again. WeeWar will send you email, but now your inboxes is even more cluttered, and you’re having to check your inbox constantly. (something I try to keep to 1-2 times an hour)

Push

A Jabber interface you could trust to push to you the state changes news, and thereby remove the nagging, “Is it my turn?” and the variable positive reinforcement relationship it sets up with your inbox.

Additionally its a classic app where, if you’re polling, you want to keep the polling time very low, but the actual incident of change is fairly spare, which means WeeWar is going to at some point start resenting their polling based APIs.

Payload

Ideally messages would also include an XML payload describing either the changes since your last turn, or the current state of the map, allowing for rich consuming clients to build alternate interfaces to the world.

New Games

Orthogonally, a new games, and new games from your “preferred players” would also be excellent to get pushed out over Jabber.

Me I just wish they’d bring back a delegated auth endpoint, whether their proto-OAuth, or a real OAuth endpoint. Meanwhile my only issue with m.twitter.com is I want the option to see only the subset of folks I have device notification turned on for.

Other folks are talking about and writing about the long germinating, launched in beta, location broker from Yahoo’s Brickhouse, Fire Eagle.

I wanted to call out just a couple of the cool, and non-intuitve decisions they made.

Is NOT a consumer brand

Fire Eagle is a service for building and sharing location data. Its the application built on top of it that you’ll interact with, unless you’re building stuff.

Fire Eagle does NOT manage the social graph

Its a service for sharing your data with friends (or services, or your toaster), but it doesn’t know who your friends are. The social graph has been outsource. Best example of a small piece loosely joined I’ve seen in a long time.

Cares about privacy and ease of use

Ninja privacy is built in. But you don’t have to care. The TOS requires developers to discuss how the data is used. And privacy levels are front and center. And from day one data is delete-able, and in fact data is flushed on a regular basis.

But if you read it closer you’ll notice the operations map to what can de done in memcache (down to transactions are handled via atomic auto-increments) with a bit of cleverness, and some persistence. (pun intended) Still a nice step towards making developing f.bk apps a bit less eye-pokey-outty

Was explaing this to folks yesterday who were worrying over bandwidth consumption of their API. Etags can help with that, but if you aren’t computation/database bound consider that perhaps you haven’t built a successful enough service.