Thursday, June 23, 2011

Josh Kopelman tweeted about my weekend project today, which is still in alpha (at best.) Until yesterday, the only follower was Dave McClure. But he follows so many accounts that I figured he just auto-followed anyone who tweeted '500 Startups.' I was going to wait until I got back from next week's vacation before mentioning it to anyone, but the twitter-feed is open, so... it's been released.

The website is the API: I wrote a script to look at the portfolio pages of VC websites every night and tweet and post new companies that seem to have been added. It's pretty useful in a way, but has some severe limitations.

The biggest is that scraping is inherently fragile. And I'm going on vacation next week and leaving the computer it's running on at home, running it. If it starts to spew garbage on Monday, well... sorry. I'll fix it when I get back.

It is reporting on differences. So when Stickybits changed to Turntable.fm, it showed that First Round added Turntable. When Tremor Media changed its name to Tremor Video, it got reported, etc.

VCs don't always update their portfolio pages in real-time. This is no substitute for Techcrunch (or Crunchbase, even), it's just faster and easier to scan. I've been adding the VCs in drips and drabs over the past few weeks, so there are certainly additions that got missed because of the timing. The web site is pretty cool, it lets you filter by name and date and sort. That will be more useful over time as more deals get added.

The list of VCs looked at is here. If I missed yours, let me know and I'll add it to the to-do list. Unless your site is in flash (cough, Norwest) or has no portfolio company names (ff Ventures, among several others) or doesn't allow bots (yes, oddly enough, there is a site that checks the user agent and sends my script a page with no real content; the robots.txt--as with all of the pages I look at--allows, but the server doesn't. Why?)

5 comments:

It does show removals, yes. But they turned out to be less interesting and somewhat unstable--the script would see things removed that would be readded the next day, etc. I decided to just try and minimize false positives in one direction.