“I’ve never been one to keep a journal (much to my chagrin as an inveterate collector of fancy stationery), but, silly as it may sound, Twitter made me feel like a field correspondent—our man on the Rio Grande—and thus encouraged me to think like a journalist.” – Buzz.

Last weekend everywhere I went people wanted to talk about the tragedy in Norway. I didn’t know much about it, just the bones of the story, but I found, as I always do, our fascination with it, perverse, and a bit grotesque.

I liked to think that in the best possible world, broadcasting the blow by blow coverage of distant tragedy connects us all with our shared humanity. But mostly it just seems ghoulish, and to borrow an old slogan, it’s like voting — it just encourages the bastards.

What I really want is someone doing in-depth, well researched and written coverage of news events 1-4 weeks after the event. When all the details are known, and sifted, and analyzed.

Not all stories lend themselves to this. That our government is derelict in its duty and will be defaulting this week isn’t a story that can wait weeks. But even a few days to pull together a decent body of reporting/facts/graphs/analysis rather than rehashed he-said-she-said-chest-beating-editorial would be nice.

Thinking a Kickstarter-esque funding model would work really well — pledge your interest in an ongoing story in real time giving the news organization a heads up that they should be paying attention and starting to research, if enough folks show interest, the story gets written. Won’t cover all types of reporting, but it certainly would be a hell of lot better then 90% of what we’ve got. (and give me a page where I can advertise my interests as a dodge to having to talk about the ridiculous pop news hype cycle, “Yeah, I’ve pledged to read about that in another 2 weeks, check out my Slow News page, let’s discuss it in depth, then, shall we?”)

Get an Auth Token

You can do this step anyway you want, but I’d grab flickr.simple, paste your key and secret into scripts/auth.php, and run it from the command line, at which point it will walk you thru an interactive prompt to get a token.

And just to state the obvious, you need a token because these feeds are personalized: photos or favorites from your contacts, photos of people you know, your photos, and your faves.

The verify_token is an arbitrary string you pass in that Flickr will pass back to you. Among other things it makes securing your subscription handler against XSS a bit more straightforward. (just don’t use “MY VERIFY TOKEN” ok?)

The challenge is a string that Flickr will generate and send to you to make sure you’re actually interested in having a bunch of photos pushed to you. You’re only responsibility in subscribing is to echo it back. This will happen when you first subscribe, and then again every time your subscription lease is up, which is by default once a day.

Here we are subscribing to the favorites from your contacts stream. You can get a list of available streams with a call to flickr.push.getTopics, which you could write a script to call, but I’d probably just call it in the API Explorer and get the list:

If you see subscriptions that are still pending with non-zero verify attempts you did something wrong. Go have your morning coffee and try again. (<== this worked for me, your mileage may vary)

Processing

So assuming that all worked, Flickr is now bombarding your callback.php periodically with Atom blobs containing the information you requested. In practice I’m sticking the blobs into Redis in a list, and treating it as a queue to split processing from callback.php (this all runs on a tiny Linode VM and tying up Apache process is a bad idea), but for the sake of demonstration we can assume that if we weren’t passed a challenge then this is a payload and we’ll add the processing right to callback.php.

There are of course many ways to parse Atom (much like there are many ways to call the Flickr API), but unsurprisingly I’d use Magpie, and in particular I’d grab rss_parse.inc, which, while not having been touched in 7 years but still works.

The interesting bit

Of course the interesting bits all come after “do logic on feed here!”. This is where you build your feel newsfeed for Flickr, your replacement for photos from your contacts, your anteater. That bit is up to you. (though given an infinite supply of lazy Sunday mornings, maybe I’ll post a follow up)

I see Twitter getting beaten up a lot for not deleting the spammers faster. Etsy gets beaten up for not deleting the “resellers” faster. Flickr used to get yelled at for not catching the photo stealers or porn spammers faster.

“It’s so fucking easy, they’re right over there, here, let me show them to you, what’s your problem?”

This comes from not understanding the cost benefit ratio of false positives in identifying abuse of a social site at scale.

Imagine you’ve got a near perfect model for detecting spammers on Twitter. Say, Joe’s perfectly reasonable model of “20+ tweets that matched ‘^@[\w]+ http://'”. Joe is (presumably hyperbolically) claiming 99% accuracy for his model. And for the moment we’ll imagine he is right. Even at 99% accuracy, that means this algorithm is going to be incorrectly flagging roughly 2 million tweets per day as spam that are actually perfectly legitimate.

If you’ve never run a social software site (which Joe of course has, but for the folks who haven’t) let me tell you: these kinds of false positives are expensive.

They’re really expensive. They burn your most precious resources when running a startup: good will, and time. Your support staff has to address the issues (while people are yelling at them), your engineers are in the database mucking about with columns, until they finally break down about build an unbanning tool which inevitably doesn’t scale to really massive attacks, or new interesting attack vectors, which means you’re either back monkeying with the live databases or you’ve now got a team of engineers dedicated just to building tools to remediate false positives. And now you’re burning engineer cycles, engineering motivation (cleaning up mistakes sucks), staff satisfaction AND community good will. That’s the definition of expensive.

And this is all a TON of work.

And while this is all going down you’ve got another part of your company dedicated to making creating new accounts AS EASY AS HUMANLY POSSIBLE. Which means when you do find and nuke a real spammer, they’re back in minutes. So now you’re waging asymmetric warfare AGAINST YOURSELF.

Fine, fine, fine whatever. You’ll build a better model. You know, this is a social site, we’ll use social signals. People can click and say “This is spam” and then when, I don’t know, 10 people say a tweet is spam, we’ll delete it and ban that account. But you know, people are fuckwits, and people are confused, and people are unpredictable and the scope of human activity at scale is amazingly wide and vast and deep, so a simple additive, easy to explain, fundamentally fair model isn’t going to work. (protip: if you’re site is growing quickly, make sure to use variables for those threshold numbers, otherwise you might DOS yourself)

But you’re smart, so now you’ve got a machine learning model, that’s feeding social signals into a real time engine, that’s bubbling up the top 0.01% of suspicious cases (and btw if you’ve gotten this far, you’re really really good, and you’re probably wasting your time on whatever silly sheep poking/nerd herding site you’re working on, so call me, I’ve got something more meaningful for you to do), and in at least Twitter’s case we’re now talking about a mere 200,000 potential spam tweets to be manually reviewed daily.

How many people do you need to review 200k spam tweets per day? How many desks do they need? Are you doing to do that in house or are you going to outsource it? And if you outsource it, how are you going to explain the cultural peculiarities of your community, because while your product might have gone global, you’re still your own funky nation of behavior, and some things that look strange (say, retweeting every mention of your own name) are actual part of your community norm.

And if you don’t explain those peculiarities, how long do you think it is until this small army you’ve assembled to review 200k tweets a day, gets tired, makes a mistake and accidentally deletes one of your social network hub early adopter types (because the sad truth is early adopters are outliers in the data, and they look funny).

And what do you think the operational cost of making that mistake is? (see also: fakesters)

Also, whats your data recovery strategy look like on a per account basis?

There are solutions. Some of them are straightforward. Many of them aren’t. None of them are as easy as you think they are unless you’ve been there. And I’m happy to talk to you about them over a beer, but just posting them on a blog, well that would be telling other people’s secrets. And they already have a really hard job.

A much more cogent blog post by Bruce Schneier from 2006, on Data Mining for Terrorists really drills into this problem from a theoretical model. (where “for Terrorists” is to be taken in the “finding Terrorists” sense and not in the “for Dummies” sense) (update: via rafe a good BBC article on base rate fallacy)

If you’re thinking about launching an application that centers around group forming as a filtering mechanism, a couple of quick feature requests:

“Smart Sets” ala iTunes are unlikely to work because human features are less well understood then pop’s features, and will lead to frustration and abandonment, so don’t do that.

However …

most entities we refer to as human have a single geographic location at any given point in time. This is a useful, and well understood feature. Cityli.st was a project I start to exploit this fact that automatically maintained and updated Twitter lists based on a person’s location. (turns out Twitter lists don’t work very well, nor are they particularly useful.) Please dear god, build this into G+ already, especially if you’re going to have checkins.

update asymmetry is a fact of life. Some folks update multiple times a day, some folks update very few minutes, some folks drop 1000+ photos in a single upload session once every 6 months. Automatic grouping by rolling average of update frequency would be extremely useful.

shibboleths (aka something you have or something you know). A group of everyone who hit this onetime URL. A group of everyone who can take a photo the Empire State Building from where they’re currently standing.

time is another interesting human constant. People I’ve recently contacted, people I haven’t contacted in years, and people who I was at the same concert/bar/office with are all useful slices.

William Morris was an important figure in my house growing up. Mostly because he was one of my grandmother’s muses (growing up as she did in Southern California surrounded by the work he inspired in the California Craftsmen, e.g. the sublime Gamble House). And we had the best picture books of his works lying around the house. (the fact that he was also a socialist and anarchist as well as a successful aesthetic theoretician and artist helped later)

Which is all apropos of very little except my thinking about why I’m so sympathetic to the design philosophy of what I call “flourishes.” Flourishes, as I think of them, are the elements you add to a design not because they are necessary or straightforward, but because they’re often hard and interesting, and their inclusion speaks of a better world than the one we live in. That’s core to my personal definition of craftsmanship.

Which again is apropos of nearly nothing, except it’s on my mind as I think about this blog post, which is mostly about dates and Flickr. (and to a lesser extent craftsmanship)

I like shoeboxes

I think dates are important. I think history is important too. I’m a self described calendar dork. And I fret about the warm bath of now-ness we seem to be currently living in; real time a synonym for ephemerality and disposability. If there is anything this culture needs less of, it is disposability. I sometimes claim this will be the great era of forgetting, an entire generation’s learnings/thoughts/beliefs as if writ in water. Which is suppose why I twitched a bit about Jason’s perfectly valid critique of the current state of sociality on Flickr, namely that it’s stagnated.

So I wanted to talk a bit about the date handling flourishes in Flickr. (I worked on nearly none of this, and much of it was already in place by the time I joined.)

Upload dates vs Taken dates

The date you uploaded a photo to Flickr is stored in epoch seconds, and can’t be earlier then the date you joined Flickr.

Fundamentally this split between system activity time, and human editable creation date models a world where the people who use your software do something other than use your software. You have to decide how you feel about admitting that possibility.

Circa dates, and date granularity

If you visit most photo pages the date does the semi-standard human friendly date thing aka if the photo was taken recently it will say “taken 18 minutes ago”, otherwise it will say “taken on August 10, 2008”.

But if you visited that Blue Grotto photo you’ll notice the date is listed as “This photo was taken some time in 1890.” That’s date granularity. Flickr taken dates come in 4 levels of granularity, exact, year-month, year, and circa.

What’s circa? Circa is a flourish. Circa is the sort of feature you only get when you care about the craftsmanship. You can checkout the George Eastman House archives, circa 1860. Those photos were all taken in 1860 plus or minus 5 years.

Computers demand exactitudes by default, but it’s a laziness of which we are collectively guiltily that we’ve traded a few programmer and compute cycles for a rich and nuanced societal understanding of time.

Archives

It should probably go without saying, but if you want to understand the story arch of someone’s life (not just the this week’s episode), having access to browsable archives is pretty key. The photos from friend page probably is the most neglected page based on it’s level importance, but the archive pages are the most neglected page on Flickr. Still they do the job.

(there’s a comparable page for places places, instead of dates, that never quite got launched, but you can sort of fake it with the personal map page)

Also, you know, just being able to jump back to a arbitrary page in history in the stream.

Date ranges on sets

This is another flourish feature. One of my favorite.

Last Summer, Jazz and I made a 2400 mile loop up through Nova Scotia. On that page in a small grey font you’ll see a note telling you the photos are “from between 04 Jul 2010 & 12 Jul 2010.” That tells a story. Immediate nostalgia. For me at least. Probably doesn’t do much for you. But that right there is my second favorite feature on Flickr. (the “You + X” pages, being my first, or really just this page of Jazz and me )

However on a more substantial set, like Steve Ford’s First Cups set, with 2036 pictures of his first cup of coffee each day for 6 years, the between dates allow you to jump around within the set and provide quick orientation.

This is a time linear navigation model, but date way markers are at least surfaced. (also this is in many ways a post-facto rationalization of the fact that sets weren’t originally paginated)

Search + Dates

Just having real search is considering something of a craftsman’s flourish these days, but assuming we’re executing at a level of competence that allows for search, adding dates as first class selectors to your search make it possible to have a page like all the photos from your contacts taken between 1980-1990.

In conclusion

No conclusion. Just some design notes, in case you get inspired, you know, for next time.

(nota bene: depending on how active you are on Flickr many of these search links might come up blank for you, I tend to use Flickr’s ability to scope searches to only things that are personally relevant to me, my favorites, my contacts, my photos, etc, your view of these links will be scoped to your personal world view. Whether or not I should be able to share my personalized view of the world publicly ala Twitter’s recently revived “With Friends” feature was heavily debated, but eventually was sacrificed on the altar of performance concerns in a system with rich privacy settings)

“I ran a new product development group within a large company and I would like to dispel the simplistic myth that big companies don’t innovate. There is innovation occurring at many big companies. The thing that big companies really struggle to do is to ship.” – John Borthwick – news.me

People ask me why I focus so relentlessly on shipping as opposed to the rest of the software development life cycle. In part because it’s hard. It’s often the hard problem. And it gets harder the longer you do it, aka it gets harder the more important the thing you’ve built is.

Folks have told me they enjoyed the talk, and found it inspiring and informative, which is immensely gratifying.

I was personally frustrated with myself as I wasn’t as prepared as I like to be, and I made a number of last minute cuts from the talk that made it feel more disjointed then I liked.

This can be summarized in the #protip: When the organizers give you a chance at a dry run to get to know the stage and the equipment, do it!

Additionally I didn’t get to talk about the work of Aaron Beppu who recently joined our search team tackling relevance ranking, did the bulk of the backend work on this project as his bootcamp project, and whose final implementation is significantly more complex and interesting, or Fred Blau, who is currently working on our internationalization effort, but who rewrote our auto-complete implementation to use sequence ids instead of timing for a much smoother interface.

In the golden foothills overlooking the bay, on the deck of a geodesic dome, sitting under the canopy of live oaks, listening to folks sing and play guitar and tell the stories of the San Onofre surf club (from way before the nuclear power plant was installed in 1968), these are my roots.

“+1” is a convention that arose on the Apache Software Foundation mailing lists. The ASF still has the best, most functional process for mailing list based collaboration which has ever been evolved of which -1/0/+1 is only the thin wedge. (the whole vocabulary of lazy consensus, commit-then-review, etc is incredibly important when trying to implement a diversity of tactics over email, as we ran into time and time again with Indymedia). Worth exploring their process in depth.

Anyway, Google launched a “+1” product today, and there was some discussion as to where the “+1” convention came from. The first place I ever encountered it was this email from Rob Hartill, on Wed, 15 Mar 1995, as part of one of the early patch voting rounds on Apache 0.7.x (the Apache foundation having formed the previous month to turn NCSA httpd into Apache).

I'll use a vote of
-1 have a problem with it
0 haven't tested it yet (failed to understand it or whatever)
+1 tried it, liked it, have no problem with it.

Rob might have adapted it from an earlier source, but I’ve never seen it.

And the conventional wisdom was that it would be one of the group messaging clients. Conventional wisdom was wrong. It was Airbnb. Airbnb was everywhere at SxSW this year. But quietly. And they were killing it.

As someone who is learning the ins and outs of building a marketplace I can appreciate how well they’re doing it (and tried and failed to track down Brian for a margarita at SxSW). Meanwhile my friends who do iOS development are impressed by how well done the app is. But mostly I just fell in love with the design of their gift card packaging.

More importantly it let the underground SxSW survive another year. The official show was so ridiculously huge, the hotels blocked out and booked so extensively in advance, that the community that made SxSW important and has been thriving in the badgeless favela, wouldn’t have found places to stay without Airbnb. (thanks to Buzz for helping me refine that insight)

Always interesting to see how fast these things converge when they start to converge. When a standardized creation myth starts to emerge you’re observing a trajectory which is starting to steepen sharply upward. These two interesting posts by Fred Wilson and Paul Graham capture two of my favorite pieces or the Airbnb story: the breakfast cereal hustle, and the hitting the street commitment to knowing their users.

stellar.io does so many things right. And by right, I mean the way I believe web apps should be built. Which is to say, distinct from the built in order to drive rapid growth, viral addictive lab rat behaviors by hijacking traffic into a pachinko machine of false attention style.

In particular, the fact that favoriting an item on stellar.io propagates that favorite back to the native medium seamlessly is a classy move from an earlier age. I thought we’d perhaps seem the last of small pieces, loosely joined into well designed, simple experiences.

Also the best of page has a lot of what I wanted for accolad.es. (though be great to be able to dial up/down how strong of a “best of” signal you’re interested in, ala Hot Links)

Only problem, still hasn’t solved the asymmetric updates problem. One person can still go on a favoriting spree of weird design elements and totally obliterate your feed.

Back in pre-history SxSWi was tiny, really small, negligible. You felt lucky to have simply found anyone else who cared about what you cared about.

Moving into more modern times, it started to feel big. But not so big that you didn’t immediately love and trust everyone who was there. The 2004/2005/2006 era it was safe to assume anyone you met on the streets of Austin during Interactive was of your tribe, and likely a good friend you hadn’t met yet. We’ll call this the “partying with 5000 of your closest friends” era. There was some technological facilitation needed on the tail end of this era. I’m wearing a much faded, much loved Dodgeball shirt as I type this.

The geometric progression of attendees continued upward though bringing us to low 6 digits where it currently sits, and a few other things happened. First Twitter. Twitter in particular had the tendency to cause social events to blow up. It was a consensus engine that drove everyone to be at the same panels, and the same parties. This almost broke SxSW. Foursquare’s arrival on the scene help defuse some of this as it allowed a more nuanced consensus to emerge. But we pretty much trashed the Driskill opening night last year in a way I hadn’t seen before (at least before Music got to town). Still this was the start of the anti-social era. Rather then assuming everyone on the street was a friend, you were actively seeking out the people you already knew.

This year the evolutionary pressure seems to be driving most of random social out of the event. The rise of private group texting pods, the preponderance of invite only parties, and the general private band communication makes anti-social the most interesting trend at SxSW this year. (as an aside, I just really like the word “pods”, but I’m personally betting on GroupMe)

Anti-social is traditionally a derogatory term. I’m not using it as such. Often the only opinions I care about are those of the people in my affinity group. I whined about the lack of, and then was pleasantly surprised that I can scope Foursquare recommendations to only people I know. I think it’s a really interesting trend, and I’m looking forward to seeing (and writing) more anti-social software. (a term I’m going to credit to Maciej).

I expect this year we’ll have squeezed much of the synchronicity out the experience, but the call of qualified information and insights our brains can layer proper social expectations onto will be too appealing, then late next 2011 early 2012 there will be a sharp rise in experiences powered by PRNG, of which Situationist is probably an early model.

“Failure to adequately address the internet sales channel and the subsequent ebook market. Specifically, the decision to outsource Borders.com to Amazon.com. To be fair, Borders.com was costing the company millions of dollars in losses each year ($20m I think when they decided to outsource) and one could argue that the outsourcing solution was a case of letting the most efficient etailing organization (Amazon.com) handle the job and turn a big negative into a profitable business. In the short-term, this saved a lot of money. In the long run, the internet is too important to outsource in this manner” – Mark Evans

“Good work is not done by ‘humble’ men. It is one of the first duties of a professor, for example, in any subject, to exaggerate a little both the importance of his subject and his own importance in it. A man who is always asking ‘Is what I do worthwhile?’ and ‘Am I the right person to do it?’ will always be ineffective himself and a discouragement to others. He must shut his eyes a little and think a little more of his subject and himself than they deserve. This is not too difficult: it is harder not to make his subject and himself ridiculous by shutting his eyes too tightly.” – G. H. Hardy