Open Source "Spying" On Open Source: The CIA Project

LinuxWorld.com recently had the pleasure to interview Micah Dowty, founder and principal contributor to a rather unique project aptly named CIA (http://cia.navi.cx/). CIA is a project that monitors a wide range of open source projects in real time tracking changes, building statistics, and alerting through a number of channels on events.

[LW]Tell us a brief history of yourself and how you came up with the idea for CIA. What were you trying to solve?

[MT] CIA is really the survivor in a chain of failed projects. It started with the Kiwi, my attempt at building a very inexpensive and completely open PDA device. That project failed at its original goals, but I did end up with a "from scratch" Motorola 68k board that booted Linux, which taught me a lot about embedded systems.

During the Kiwi's development, I decided I needed to write a GUI. Honestly, this probably originated as Not Invented Here syndrome, but its architecture evolved into something really interesting to me: the PicoGUI project.

There might still be people using PicoGUI today, but I lost interest in it a couple years ago. Luckily for CIA, at some point we decided PicoGUI needed a bot reporting commits to our IRC channel.

This early bot was the first incarnation of CIA. It was a quick afternoon hack, written by myself and named by Lalo Martins. About a week later, Mike Hearn suggested modifying it to work with any number of projects, and putting it in a central IRC channel. This was June 1, 2003, the birth of #commits on Freenode. In just the space of a few hours, #commits grew from nothing to about the size it is today.

Originally, CIA was just created to make the PicoGUI project easier and more fun to keep track of. When we set up #commits, the motivation was mostly just for the novelty of seeing what everyone else is working on at any given time. I think it was much later that we realized just how useful CIA could be to the projects using it. This is one of the things that sparked the complete rewrite in December 2003.

[LW]CIA is open-source looking/spying on open-source. How has your project been received by many of the larger open-source bodies?

[MT] The response to CIA has been very positive. Really the only negative comments I remember getting are related to server downtime or bugs in the IRC code. CIA has been pretty reliable on its current home, but there have been periods of time in the past when for either software or hardware reasons it was crashing all the time. It seemed like every time it went down for a few hours I'd get someone threatening to reimplement CIA as a 50 line shell script. Of course, that's pretty much what CIA was before its rewrite- and there are a lot of advantages that this 15,000 lines of Python have over the old pile of shell scripts.

There are several large projects that are making use of CIA and showing their support by linking to the web interface. Gaim, AnhkSVN, Enlightenment, Gentoo, Adium, and Beagle are just a few of the larger projects that use CIA and link to it prominently on their web sites. I don't think CIA has received any official endorsements by large open-source projects or organizations, but some powerful members of these organizations have shown interest. Nat Friedman of Gnome fame was quite excited about CIA and sent a big donation.

[LW]What do you see as the top 3 features of CIA?

[MT] I think CIA's top feature is that anyone can use it, and it's about as easy to set up as possible for the version control system you're using. With about the same effort it would take to set up a commits mailing list, you can connect your project to a server that will get your commits onto IRC, the web, and RSS.

The next best thing about CIA is how it isn't tied to any particular version control system. Internally, CIA is just an architecture for publishing, filtering, and formatting arbitrary messages. CIA supports version control systems I've never used, and it's being used for more esoteric purposes like reporting automated build results. I know I've seen several projects out there for mailing commit messages or generating RSS feeds, but they're all designed for one specific version control system. CIA's client scripts act as an abstraction layer, so by writing a new client you can use it with pretty much anything.

The web interface has always been secondary to IRC commit delivery, but I see its ability to create a community of projects as the next most important feature of CIA. Every person and every project on CIA automatically gets a web page, and they're all linked together. Each page has a "related" box that lets you see who works on a particular project, what projects a particular author works on, which version control systems an author uses regularly, etc. These associations actually form an undirected graph that ends up tying most projects together in some way. Back when CIA was smaller, we could visualize this graph. Nowadays it just takes way too much CPU time.

[LW]How have "users" used the data, stats, and events published from the CIA Notification server?

Many people link to their author page from a personal homepage or blog, and more and more projects are including links from their web site to their CIA stats page. A few projects are including CIA stats directly on their web site using RSS aggregators. CIA does provide a low-level XML feed with more detailed stats, and there's an XML-RPC interface that gives you easy programmatic access to all the data used to generate the web site. I don't think anyone is actually making use of this yet, but it's hard to expect people to use interfaces I haven't got around to documenting yet.

The coolest practical use of CIA I've seen recently was on the Planet Gentoo site. Since Gentoo contributors have the same username everywhere, they could link every blog post directly to that user's CIA stats page.

I expect people will find even more diverse ways to use CIA once I make the details of the XML-RPC interface well-known. I'm also really hoping that publish-subscribe becomes more common, as polling the RSS feeds really generates a huge amount of web traffic.

[LW]The community aspect of CIA is interesting to learn of, with people making a big play of their own 'commit' status. Do you see the need to feed peoples egos is a big part of what CIA can deliver?

[MT] Definitely. Commit reporting has been done before, but one of the things that makes CIA really unique is that it brings projects together into a larger community. Anybody's CVS to RSS gateway or commit mailing list can be useful to developers in pretty much the same way, but CIA has a way of introducing a bit of healthy competition. People love seeing their work Show up in public IRC channels. It seems less like they're locked in a closet pounding away at code in isolation, and more like they're doing something interactive that everyone else can see. CIA lets everyone know when you're making progress and gives you a virtual pat on the back for it. I know many people have trouble developing when CIA isn't around, since it just isn't quite as much fun.

[LW]CIA is watching itself, which is pretty cool. Have you had much help from the community development wise?

User contributions have been very important to the CIA client scripts, and in defining the XML message format. I wrote the client script for Subversion repositories, but all other clients were contributed by users. On the server side though, I've been mostly alone. The server's codebase is pretty clean and well-organized, but it's big and largely undocumented. The server is tricky to set up, and it has a steep learning curve, so it has much less appeal for random hacking than the client scripts.

[LW]Fundamentally CIA requires a small piece of script to be installed in the CVS/SVN servers to alert it when something changes. How do you go about asking for support from say SourceForge based projects? Have they been supportive?

[MT] CIA has spread really well just by word of mouth. Generally a project admin or enthusiast hears about CIA, sets it up, then the first news I get about it is a request for a metadata key or IRC bot. When the project was brand new Mike, Lalo, and I advertised it to a few other projects and set up scripts to scrape commits off of email lists. There are still a few projects that are connected to CIA via mailing lists, but the vast majority of projects were set up without any direct encouragement from us.

[LW]What are the longer term plans of CIA? Where do you see it heading?

There are some loose ends that I'd like to tie up, like web-based registration for IRC bots and metadata keys. I'm sure there are more bugfixes to be had. That's all just polishing what's already there I don't see CIA changing a whole lot, just becoming easier to use, more robust, and more scalable. CIA already has a lot of feature bloat for what it is, really. The biggest change I see happening in CIA's future is making it easier for people to set up their own CIA servers in such a way that the load can be shared across many machines but the large-scale relationships between people and their work can be maintained.

Micah Dowty: Bio Details

Micah started tinkering with electronics and software at a very early age thanks to having an engineer for a father and a teacher for a mother. He finds himself learning more from his own personal projects than from school, and he has also contributed to a handful of larger open source projects including BZFlag, Crystal Space, and the Linux kernel.

Alan Williamson is widely recognized as an early expert on Cloud Computing, he is Co-Founder of aw2.0 Ltd, a software company specializing in deploying software solutions within Cloud networks. Alan is a Sun Java Champion and creator of OpenBlueDragon (an open source Java CFML runtime engine). With many books, articles and speaking engagements under his belt, Alan likes to talk passionately about what can be done TODAY and not get caught up in the marketing hype of TOMORROW. Follow his blog, http://alan.blog-city.com/ or e-mail him at cloud(at)alanwilliamson.org.

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Cloud Expo

Cloud Computing & All That
It Touches In One Location Cloud Computing - Big Data - Internet of Things
SDDC - WebRTC - DevOps
Cloud computing is become a norm within enterprise IT.

The competition among public cloud providers is red hot, private cloud continues to grab increasing shares of IT budgets, and hybrid cloud strategies are beginning to conquer the enterprise IT world.

Big Data is driving dramatic leaps in resource requirements and capabilities, and now the Internet of Things promises an exponential leap in the size of the Internet and Worldwide Web.

The world of SDX now encompasses Software-Defined Data Centers (SDDCs) as the technology world prepares for the Zettabyte Age.

Add the key topics of WebRTC and DevOps into the mix, and you have three days of pure cloud computing that you simply cannot miss.

Delegates will leave Cloud Expo with dramatically increased understanding the entire scope of the entire cloud computing spectrum from storage to security.

Cloud Expo - the world's most established event - offers a vast selection of 130+ technical and strategic Industry Keynotes, General Sessions, Breakout Sessions, and signature Power Panels. The exhibition floor features 100+ exhibitors offering specific solutions and comprehensive strategies. The floor also features two Demo Theaters that give delegates the opportunity to get even closer to the technology they want to see and the people who offer it.

Attend Cloud Expo. Craft your own custom experience. Learn the latest from the world's best technologists. Find the vendors you want and put them to the test.