Archive for June, 2009

All those great Stack Overflow questions, answers, and comments, so generously contributed by all of you, are licensed under cc-wiki (also known as cc-by-sa):

cc-wiki license

You are free

to Share — to copy, distribute, and transmit the work

to Remix — to adapt the work

Under the following conditions

Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

The community has selflessly provided all this content in the spirit of sharing and helping each other. In that very same spirit, we are happy to return the favor by providing a database dump of public data.

We always intended to give the contributed content back to the community as a whole. Our primary concern was making sure we didn’t have an AOL-style “incident” where we accidentally release personally identifying information in so-called “sanitized” data. Stack Overflow user Greg Hewgill was kind enough to help us beta test several iterations of the data dump, ensuring that we didn’t release anything except content that is visible on the public website. He also suggested several improvements to improve the data dump, so that it contains as much useful public information as possible.

Note that if you republish this data, we require attribution as described in this blog post. Most importantly, there should be hyperlinks back to the original question, and the profiles of all participants.

Our plan is to create a new data dump every two months, reflecting all data in the system up to that date. We will seed the latest and greatest dump (at a low bitrate) as long as we can, ideally permanently.

And yes, it’s still fun to say “data dump”. We look forward to seeing what the community can do with this data!

update: per this message from Cameron Parkins of Creative Commons, cc-wiki is now an alias for cc-by-sa.

Hi Stack Overflow-ers,

My name is Cameron Parkins – I do community outreach at Creative Commons and recently stumbled across your latest CC data dump.

Very cool that you all are using CC! I wanted to give you a heads up that the license you’ve chosen, the “CC Wiki-License”, isn’t really around any more. It is in the sense that it links directly to our CC BY-SA license, but our attempt to brand it as a separate license for wikis never got off the ground. We don’t use or promote it anymore and when we see it, we try and reach out to whoever is using it to let them know.

Part of the problem is that the Wiki License doesn’t carry any value, while our BY-SA license (which is what the wiki license is) has widespread community support around it. Would you all consider switching your indication as such?

Let me know if you have any questions – would like to promote the project through our networks.

This is the 56th episode of the StackOverflow podcast where Joel and Jeff sit down with Jason Calacanis to discuss the business side of software, including Mahalo’s “Skee-Ball” economy, when VC funding is appropriate, and whether SEO matters.

Jason Calacanis regales us with his tales of being a BBS script kiddie on his IBM PC Jr. He later got fired from his job in the Fordham computer lab for setting up a warez partition on one of the computers in the lab. Oh, and he installed a keylogger on his boss’s computer, and sold pirated software on floppies, too. :)

Apparently the Q&A format — dubbed “Knowledge Exchange” — was pioneered in Korea with Naver and Daum, which Yahoo Answers copied for the US. In Korea, the primary way to get information is through users exchanging knowledge, not search algorithms.

Rather than translate the app, Facebook apparently let users volunteer to translate different parts of the Facebook UI itself. Jason’s Mahalo is not localized.

In Korea, the main knowledge exchange sites are all noindexed, so Google is a non-starter there. If all the newspapers in the US noindexed as a consortium, Google would be screwed.

Jason is a big fan of the badge system on Stack Overflow, which he plans to add to Mahalo. This of course is modelled on the Xbox 360 Achievements system; every badge in the system is there to encourage community building (and not inadvertently community destroying) behavior. It’s a surprisingly fine line.

Joel’s big objection to Mahalo is that, like the now-defunct Google Answers, it turns an intrinsic motivation for asking and answering questions into an extrinsic motivation (hey, I can get paid real money for this!)

Jason maintains that money is not the primary motivator on Mahalo. He calls it a “Skee-Ball Economy”, where you are playing skee-ball for fun, and getting lots of tickets to cash out and buy fun things. It’s a “token economy”. You can’t make a lot of money, but it (theoretically) adds a secondary driver to an already fun activity.

Jason equates the Stack Overflow community with an “expert economy”, akin to the open source software ecosystem. Jason mentioned that he has used nginx and hadoop mailing lists to identify people to hire and/or bring in to teach the other developers at Mahalo. My question is, why shouldn’t Mahalo also be an expert economy?

Jason says “I’m not so much into creating the financial system to get something out of people, it’s more that I like to take work that was previously undercompensated or not compensated and make it into a career. I’m very proud of the fact that we [WebLogs, Inc.] were the company that made blogging into a career.”

Jason famously offered the top 25 users of Digg $1,000 a month to become community managers at Netscape. And 23 of the 25 users took that offer. Joel says this is like paying for sex — applying money at the one point where most people do not have a problem getting people to contribute to a community. Jason: “I may have made a mistake”, but traffic increased, and he maintains there was already a shadow economy around paid submissions to Digg.

Jason, who has a reporting background, ultimately wanted to add a layer of journalism and editorial control to the stories submitted to Digg. Even on “anything goes” vote driven sites like Digg, they do have one level of editorial control, in that stories can be “in dispute.”

Joel asks Jason — with your background in VC and funding, what would you do with Stack Overflow? Would you raise money? How, why, and for what? (Which reminds me: what’s the difference between VC funding and a flaming bag of poop left on your doorstep? Trick question! There is no difference!)

Is writing software “hard”? That depends on your tolerance for frustration and what you happen to be building.

Jason, as a serial entrepreneur, points out that there is no downside or risk to starting a company and failing in the United States. If you have great ideas, don’t just let them marinate in your brain forever — get out there and start building them!

One of my favorite Calacanis posts is Why people hate SEO … and why SMO is bullcrap. 90% of SEO is simple rules for building clean HTML. The other 10% is that SEOs are really just life coaches who need to transform their clients into something awesome that people actually care about. What you really need to optimize for is being awesome. And that’s a bit harder than pulling some SEO out of your magic bag of tricks.

We answered the following listener questions on this podcast:

Brian McKay: “I recently read the book Dreaming in Code. One of the concepts in the book is that software development is a very difficult profession — that software and surgery are two of the most difficult things that a person can attempt. Do you agree with this?”

If you’d like to submit a question to be answered in our next episode, record an audio file (90 seconds or less) and mail it to podcast@stackoverflow.com. You can record a question using nothing but a telephone and a web browser. We also have a dedicated phone number you can call to leave audio questions at 646-826-3879.

Since Joel and I are already in bed together (figuratively! figuratively!) it makes sense to combine our efforts in the jobs area rather than duplicate work. The last thing I wanted to do was create Yet Another Job Network. Any responsible programmer would resist this, anyway, because of Curly’s Law and DRY.

That said, this is just a start on the careers front. We have some more innovative things we are working on in this area that we hope to roll out in the next 6 to 8 weeks. Like, say, wouldn’t it be cool if your CV listed the stuff programmers really care about, such as your first computer …

We’ve added a new stats tab to the tag view that shows some basic statistics within the tag(s). Here’s what it looks like for /questions/tagged/iphone:

The number next to the user names reflects the number of non-community wiki answer upvotes for each user. This is the same algorithm used to award the tag-based badges, so if you ever wondered how close you are to getting one of those badges, now you know!

The intent here is to highlight Stack Overflow and Server Fault users who are actively contributing within specific tags, even if they don’t have giant reputation scores across the entire site.

We may add some more tag-based statistics here; what do you think makes sense to show?

We’re considering some sort of monthly league, but I’d prefer to see it at the tag level rather than at the site level, to highlight those new and up and coming contributors in specific domain areas.

This is just the first pass; I expect us to refine and improve this feature over the next week or so.