Tuesday, July 31, 2007

For me, the two greatest programming problems in Computer Science right now are a) how to bring masses of data together and b) how to easily deploy functionality. Certainly there are lots of people working on parts of these problems, but it does not seem as if people have really put the issues into focus or looked at the bigger picture.

Underneath, software is just a tool to manipulate data. We can capture mass amounts of data, but we have trouble using it. There are enough degrees of freedom in our technologies that each group of developers can choose to implement their models differently. As such, it is a higher order problem in general to combine any two sets of data. No amount of code or algorithms, will ever solve this issue. If we can't bind the data together, than we can't make use of it as a single collection. Concepts such as data warehouses try to avoid the issue by making copies of it in other locations in other formats. The amount of effort and administration to make this work are tremendous, but many organizations if they preserve have been able to keep these types of systems running. The longer the system runs, the more of an undertow that builds up against it, making it harder to change. At some point, the frequency of changes crosses over the threshold of barriers against change. Unless corrected, the pending changes to the project grow faster than the ability to make them.

The other big approach to combining data comes from the Internet search folks. They can combine masses of data, but they do it by essentially removing all of the type information and making it into one big mass of characters. It is an interesting approach, but without maintaining structure on the data, we become severely limited in the types of questions we can ask. We also move away from discreet working algorithms that provide 100% accuracy, into messy statistical heuristics that only answer the question for some percentage of the data. The results are interesting, but somehow they appear crude when we consider what types of calculations can really be done by a computer.

Most functionality out there is pretty simple. It wouldn't take long to write it, but we are forced to write all of the other other bits and bytes that are necessary to wrap it up, including the application and the packaging. If it came down to a simple set of manipulations that were allowable on a specific set of data we could make significant progress in enhancing our applications. There have been attempts at frameworks or simplified languages, but Frederick P. Brookes insistence that there is no such thing as a Silver Bullet has scared away most people from delving into this issue. While we could never remove the issue -- it too involves higher order reasoning -- that doesn't imply that we can't build a foundation onto which new functionality is easily integrated. The limits of the foundation transcend into the limits of the functionality, but we can easily build a simple very wide foundation. There is nothing magical here.

I think both problems are within our reach, but to solve them we need to stop proceeding along our current path. Generally now we just pound out reams of code, over and over again in each new and upcoming technology without really looking at the problems we are trying to solve. The entire industry, and that probably includes the academic community as well, is so focused on coding that we have forgotten to ask whether or not we are writing the 'right' code. It is unlikely, mostly because we wrote the same thing last decade, and then the decade before that... We have also forgotten to ask if we are using the right procedures to build the code, "but that's another story" as Hammy Hamster would say.

Saturday, July 28, 2007

I am -- I think -- judging too harshly. My feelings towards software development have lately been ones of extreme disappointment. I keep thinking that we should have achieved some larger number of advances by now, particularly in the way we build systems. However, over the last couple of decades, while software has definitely not lived up to its potential, I have to admit that there have been a few significant advances.

I've seen subgroups of the software industry that are best described as being ten to twenty years behind the rest of us. The software looks and feels like it was written in the mid-nineties. It has that fugly aesthetic about it, the kind that makes it look like some old soviet machine. All stiff and practical, without any consideration given to its actual appearance. That I can judge a piece of software to be so outdated shows that we have made some type of progress. From a usage perspective we have moved. Our gains however, may have only been in the visual style of the interface, more interconnections between systems and little else.

We now have access to an unprecedented amount of data. But, we also have no real way of leveraging that. Early on we picked relational databases to be one of the corner stones for implementations. That choice so often pushes us into architecting some strange and overly complicated solutions to get around their inherent limitations. There are mass amounts of data trapped in vertical silos with no easy way of liberating them. So, we can collect and store more data than ever before, but we cannot consolidate it, or easily mine it for higher knowledge.

We do have more functionality available to us. And with movements like OpenSource, we actually have a huge amount of available functionality, free and commercial. Unfortunately, large chunks of it are essentially unusable. There are so many badly designed software tools out there that are completely undependable. If we can't use the tool for practical use, then its no good. Even many of our big famous tools that worked well in the past are rapidly degrading. The features that proved useful early on have bloated into untrustworthy, and unpredictable code. A clear sign of this instability problem is that we are doing less customization, automation and scripting because the tools require more and more intelligence to work around their rapidly changing flaws. Our foundations -- which are getting bigger -- are so unstable that we are losing our ability to build on top of them.

We have more methodologies available to us, and we can choose between light weight and heavy weight, but none of these appear to be an actual improvement on software development. They rehash old ideas or invest in unpractical new ones, but mostly they all choose to stay away from the really difficult issues. If we don't change the steps we use to development software, then it seems a reasonable conclusion that the success of the projects and the quality of the systems will remain the same. That much has not changed in decades. If we play with fun ideas, but don't deal with the real issues, then the problems will remain. There are more methodologies to choose from, but none of them seem designed to improve the results.

So, there does seem to be some progress, but it does not seem to have made significant changes in the software development industries. They remain, as always, producing things that are expensive and undependable. Software has a great deal of potential, but there is a long way to go before it will start to live up to it. Some day, it is easy to image that people will look back into these dark days and wonder how it was that we even manged to keep most of our systems running. A circumstance that becomes increasingly dangerous as we put more and more faith into our technologies.