One of the most painful legacies of UNIX’s long gestation is the mess of scripts, configuration files, and databases we affectionately call “/etc”. Binaries go in “/bin”, libraries belong in “/lib”, user directories expand out under “/home”, and if something doesn’t fit? Where does it go? “/etc”.

Of course, “/etc” isn’t really the overflow directory that its name would imply. “/config” would be a far better choice, more accurately reflecting the nature of its contents, but like so much of the data contained within it, its name suggests a lack of organization rather a coherent collection of configuration settings. While the rest of the filesystem has moved on to SQL, the user account database is still stored in colon-separated fields. Cisco routers can snapshot a running network configuration to be restored on reboot, but the best we can do is fiddle the interfaces file, then reboot to take the changes live. Or adjust the interface settings by hand and wait for the next reboot to see if the network comes up right. Our name service is configured in /etc/hosts, or /etc/resolv.conf, or /etc/network, or /etc/dnsmasq.d, or /etc/systemd, or wherever the next package maintainer decides to put it. Nor can you simply save a system’s configuration by making a copy of /etc, because there’s plenty of non-configuration information there, mostly system scripts.

What a mess.

The networking community has been experimenting for years with standardized data models for device configuration, initially SNMP MIBs, and more recently YANG. While SNMP was widely adopted for monitoring operational statistics, it was never really accepted for device configuration. YANG may change that. At the very least, Cisco seems to be moving decisively towards adopting YANG
for configuration tasks, with both NETCONF and RESTCONF supported as configuration protocols. It is yet to be seen if these network programmability tools will supplant command-line configuration to any meaningful degree.

Part of Cisco’s network programmability strategy involved their 2014 acquisition of the Swedish company Tail-f, a leader in YANG development. I recently spent a few hours experimenting with the free version of their confd product, and while it isn’t free software, it certainly left me thinking that we need something like it, and soon.

confd‘s basic design is that a single configuration daemon (confd) manages all system-wide configuration tasks. confd provides a command-line interface for interactive configuration, speaks NETCONF and RESTCONF to management software, saves and restores running configurations, and maintains a history of configuration changes to facilitate rollback. Other specialized daemons interact only with confd, via an API that lets them retrieve their configuration information and receive notifications of changes.

Consider, for example, dhcpd, and allow me to discard Tail-f’s currentl implementation of confd and speculate on how a free software confd might operate. dhcpd‘s distribution would include a file or files containing a YANG data model that specifies all of dhcpd‘s configuration state, plus operational data such as a list of current DHCP leases. dhcpd, when it starts, would connect to confd over D-Bus, using a standard API that allows any YANG-capable client to announce a YANG model to confd, obtain its current configuration, and subscribe to future changes. Administrator configuration of dhcpd will occur by connecting to confd‘s CLI and altering the data model there. The dhcpd model would be one part of a comprehensive data model that would represent the configuration of the entire system, and could be saved and restored in any format (config file, XML, JSON) understood by confd. confd, on its part, would need no DHCP-specific component, as it could parse YANG and interact with any client exporting a YANG data model.

The ultimate goal would be replacing most of /etc with a single configuration store that captures the entire configuration of the system. The running and startup configuration would both be stored in the same format, allowing administrators to adjust the running configuration, then save it with the confidence that it will be correctly restored after the next reboot.

The biggest challenge would be to convince the developer community to embrace such a standard. With so much momentum behind the current hodge podge structure of /etc, it seems like a tough sell. The easiest thing to do is to keep doing what we’ve always done. Got a new daemon? Whip up a config file in whatever format you please and stick it somewhere in /etc. Which, of course, add more momentum to a poor design.

A first step in this direction would be a free software version of confd that can parse YANG, presents a D-Bus API, has a configuration mode, and speaks NETCONF and/or RESTCONF. Then we can start modifying existing daemons to use the new standard. Some kind of backward compatibility to parse and write existing conf files would probably be mandatory, though the whole point of the design is to move away from that.

Cloud computing has become one of the biggest buzzwords in IT. What is it? How does it work? Is it for real?

Cloud computing is an old idea in a new guise. Back in the 70s, users bought time on mainframe computers to run their software. There were no PCs. There was no Internet. You punched your FORTRAN program on cards, wrapped a rubber band around them, and walked them over to the computer center.

Then came the PC revolution, followed hard on its heels by the Internet. Everyone could now have a computer sitting on his desk (or her lap). Sneaker net gave way to Ethernet and then fiber optics. Mainframes became passé.

Well, mainframes are back! Turns out that centralized computing resources still maintain some obvious advantages, mostly related to economies of scale. Once an organization’s computing needs can no longer be met by the computers siting on people’s desks, a significant investment is required to install a room full of servers, especially if the computing needs are variable, in which case the servers must generally be scaled to meet maximum load and then sit idle the rest of the time.

Several enterprising companies have installed large data centers (read: mainframes) and have begun selling these computing resources to the public. Amazon, for example, is estimated to have 450,000 servers in their EC2 data centers. EC2 compute time can be purchased for as little as two cents an hour; a souped-up machine with 32 processors, 244 GB of RAM and 3.2 TB of disk space currently goes for $6.82 an hour. Network bandwidth is extra.

Yet wasn’t the whole point of the PC revolution to get away from centralized hardware? Can you really take a forty-year-old idea, call it by a new name, hype it in the blogosphere, and ride the wave as everyone runs back the other way?

In its day, at least as much hype surrounded the client-server model as now envelopes the cloud. Information technology advances rapidly enough that every new development is trumpeted as the next automobile. A more balanced perspective is to realize that there are merits to both centralized and distributed architectures, and after two decades of R&D effort devoted to client-server, we’re now starting to see some neat new tools available in the data center.

One of the nicest features available in the cloud is auto-scaling. Ten years ago, I ran freesoft.org by buying a machine and finding somewhere to plug it into an Ethernet cable. The machine had to be paid for up front, and if it started running slow, the solution was to retire it and buy another one. Now, running in Amazon’s EC2 cloud, I pay two cents an hour for my baseline service, but with the resources of a supercomputer waiting in reserve if it starts trending on social media.

A supercomputer! That’s what lurks behind all of this! A great number of these competing cloud architectures boil down to competing proposals to build a supercomputer operating system, coupled with an accounting system that provides a predictable price/performance ratio. Virtualization is one of the most popular models to achieve this.

Yet virtualization has been around for decades! IBM’s VM operating system was first released in 1972! Running a guest operating system in a virtual machine has been a standard paradigm in mainframe data centers for over 40 years! IBM’s CMS operating system has evolved to the point where it requires a virtual machine – it can’t run on bare hardware.

I’d like to see an open source supercomputer operating system, capable of running a data center with 100,000 servers and supporting full virtualization, data replication, and process migration. Threads are the way to write applications that run across multiple processors, so we should have a supercomputer operating system that can run on 100,000 processors. GNU’s Hurd might be a viable choice for such an operating system.

I may never have been on a sailboat in my life, but I am fascinated by the physics behind their operation. They must operate simultaneously in two mediums, as both an airfoil and a hydrofoil. Plus, they are probably one of the “greenest” vehicles ever conceived.

Which makes you wonder why we don’t use them more.

Not only are they dependent on the weather, but they are also considerably slower than an airplane. I’ve been thinking about how to make them faster, and my inspiration came from a book about airplanes – The Deltoid Pumpkin Seed by John McPhee. This book documents, in popular language, the experiences of a group of entrepreneurs and engineers to build a hybrid airplane/airship (the Aereon) that would have a small engine and use helium to improve its lift characteristics.

Why not do the same thing with a sailboat? Especially since most of the drag comes from the “wetted hull”, it would make sense to lift the hull out of the water as much as possible and leave only the keel submerged. Ship designers have been doing this for years with cleaver hull designs intended to lift themselves out of the water as they get up to speed, but the Aereon design suggests another way – helium.

What seems to make sense to me would be to build a trimaran and fill the outriggers with helium.

A friend of mine recently bent my ear for an evening over cold fusion. It struck me as pseudo-science, but not wanted to be prejudiced, I spent some time with a web browser looking into it.

Though I still have a hard time believing some of these claims, I have to admit that this technology shows promise.

The Coulomb barrier that must be overcome to fuse two protons is about 5 MeV – not something you’d expect at room temperature, but well within the range of a standard particle accelerator. The problem is the minute cross section of the nucleus – protons with enough energy to fuse are far more likely to be scattered away from each other unless they are precisely aligned in a head-on collision.

That’s where the cold fusion claims start to get interesting. All of the cold fusion reports that I read involved palladium as a catalyst. Now palladium has the unusual property that it can absorb significant quantities of hydrogen. There doesn’t seem to be a consensus on exactly how this works, but one explanation is that the hydrogen nuclei can move fairly freely within the palladium crystal mesh. Now if I wanted to line something up at atomic dimensions, a crystal would be the obvious choice, and if I wanted to line something up while its moving, then I would want a crystal that allowed my particle mobility. So palladium seems like an obvious choice to line up moving protons to precisely collide them.

A theorem of Thorup and Zwick (Proposition 5.1 in 2001’s Approximate Distance Oracles) states that a routing function on a network with n nodes and m edges uses, on average, at least min(m,n^2) bits of storage if the “route stretch” (the ratio between actual path length and optimal path length) is less than 3 (i.e, if two nodes are two hops apart, the actual route taken between them must be less than six hops). On the Internet topology, we can expect the n^2 term to dominate, so spreading these n^2 bits out among n nodes yields an average of n bits per node – i.e, each router’s routing table has to hold one bit for every device on the network.

Not a very encouraging result for those of us designing routing protocols.

Yet there is hope. The result is only an average. We can do better than the average if we allow our routing function to be skewed towards certain network topologies. And it occurs to me that the Internet doesn’t change fast enough that we can’t skew our routing function towards the current network topology.

How can we do this? With dynamic addressing: skewing our address summarization scheme to reflect the current network topology.

Could contemporary technology be used to build a Gibson-esque implant that could connect a human brain to a computer? Maybe more practical, what would it take to curve spinal cord injuries like Christopher Reeve’s using this kind of technology?

I’ve written before about how the failure of source routing created the need for NAT, but that post didn’t address the basic security problem with source routing that caused ISPs to disable it. It allows a man-in-the-middle attack where a machine can totally fabricate a packet that claims to come from a trusted source. There’s no way that the destination machine can distinguish between such a rogue packet and a genuine packet that the source actually requested be source routed through the fabricating machine. At the time, Internet security was heavily host-based (think rsh), so this loophole became perceived as a fatal security flaw that led to source routing being derogated and abandoned.

A quarter century later, I think we can offer a more balanced perspective. Host-based authentication in general is now viewed as suspect and has largely been abandoned in favor of cryptographic techniques, particularly public-key cryptosystems (think ssh) which didn’t exist when TCP/IP was first designed. We are better able to offer answers to key questions concerning the separation of identity, address, and route. In particular, we are far less willing (at least in principle) to confuse identity with address, if for no other reason than an improved toolset, and thus perhaps better able to judge source routing, not as a fundamental security loophole, but as a design feature that became exploitable only after we began using addresses to establish identity.

Can we “fix” source routing? Perhaps, if we will completely abandon any pretext of address-based authentication. What, then, should replace it? I suggest that we already have our address-less identifiers, and they are DNS names. Furthermore, we already have a basic scheme for attaching cryptographic keys to DNS names (DNSSEC), so can we put all this together and form a largely automated DNS-based authentication system?

Apple’s recent announcement of the iPhone has inspired me to reconsider how IT can be used to support foreign language studies. According to Apple, the iPhone will have a microphone (it’s a phone, after all), run OS X, and have 4 to 8 GB of memory. That should be a sufficient platform to load a voice activated dictionary. After training a voice recognizer, you could speak a word into the device which it would then lookup in a dictionary and display the dictionary entry on the screen, providing language students with the detail of a full sized dictionary in something that could fit in their pocket.

As streaming video has become more commonly available, it is now plausible to discuss offering a video interface to a website. A user could connect to a site using a video client and “navigate” on-line video content using a DVD-style user interface – video menus, highlighted on-screen buttons, fast-forward, subtitles. Alternately, such an interface could augment conventional TV broadcasts, offering a nightly news program with video hyperlinks to hours of detailed coverage of events we currently see only in sound bites.

I’ve read some stuff on the Internet about using pre-computed tablebases to solve complex endgames. Apparently you can load a five- or six- piece tablebase into a program like Fritz and it will then know how to mechanically solve any endgame with five (or six) pieces on the board.

I starting thinking about this, but more along the lines of building the tables dynamically, using a pool of cooperating computers on the Internet. The idea would be to direct your search towards certain game positions that you wanted to analyze. This would work well in static analysis and also in relatively slow correspondence time controls (like Gary Kasparov vs The World, or the upcoming The World vs Arno Nickel).