Friday, February 24, 2012

The
standard approach to getting a big project accomplished to identify the
problem, come to an overall solution, then keep breaking it down into
smaller, more manageable pieces. Once these are small enough -- specific
tasks -- all that’s left is to march through them one-by-one and
complete them. This is quite a reasonable way to approach problems where
the problem isn’t heavily hierarchical in nature and the sub-parts are
mutually independent or at least mostly independent. If
however, the problem is multi-dimensional in its nature, and there are a
significant number of different ways to decompose it, this doesn’t
work. If there are at least two logically consistent ways to decompose a
problem then the underlying pieces are interdependent. These
cross-dependencies means that “linearizing” the problem and marching
through each piece in sequence will result in a significant amount of
redundant effort. Beside the extra work, redundancies reek havoc because
people are inherently inconsistent. These resulting differences in
output (and any consequential problems) amplify the complexity, and
again result in extra work. An
example of this is a problem that can broken down into A, B and C. It
can be further decomposed into 1, 2, 3 and 4. So if we decompose
lexically first, we get the following pieces: A1, A2, A3, A4, B1, B2, B3, B4, C1, C2, C3 and C4. If we decompose numerically we get:1A, 1B, 1C, 2A, 2B, 2C, 3A, 3B, 3C, 4A, 4B and 4CThus
we have a two level hierarchy which falls into 12 different sub-pieces
that we need to complete in order to get the project done, with two
reasonable ways to slice and dice it.While
this type of problem exists generally with any type of labor, it is
very noticeable in software development. As such the concept of
‘polymorphism’ arose. The essence behind this concept in software is
that for a number of similar but different types of data, they all share
a common interface so that the code working on them doesn’t need a
specific implementation for each different type. Thus, this approach
addresses a decomposition on a finite number of static data-types. But
the concept itself is general. It is also not restricted to things that
are particularly static, nor data. Thus it can be applied to dynamic
data as well. Data that is mutating in unexpected ways can still share a
common interface (and support reflection). But it can also be applied
to static code, where all of the blocks of code share an identical
interface. And of course, given those other two, it can apply to dynamic
code as well. While
it is a commonly used and a very effective programming paradigm, it can
also apply generally back to any basic problem decomposition. In our
above example, there are 12 different pieces of work that need to be
completed. But there are really only 7 different unique items in that
work list. Their permutation may be a combinatorial explosion, but that
doesn’t mean that the only way to approach getting the work done is to
explicitly work through all 12 pieces. We might conclude that at maximum
efficiency we need only complete 7 pieces of work, but I would suspect
that in practice there is some other overhead required in order to bind
together the sub-pieces. Still the expectation would easily be that a
polymorphic approach to handling the work would lay somewhere between 7
and 12. If it were 8 for instance, that would mean that the most
efficient way to solve the problem was 2/3rds of the amount of effort
required to just pound through all of the pieces. That would be a
considerable savings.Now,
at this point some readers are probably on the edge of their keyboards,
but they likely fall into two different categories: a) those that
disagree with there being a more efficient way of solving the problem,
and b) those that see this whole discussion as obvious. I’ve certainly
met both kinds of people in practice, although I think more people fall
into the ‘a’ category. The reason I think this is more common is that it
is easy for most people, when faced with a problem to adopt tunnel
vision. That is, they don’t focus on the ABCs, nor on the 123s, but
rather they go pick what they believe to be the one correct
decomposition (lexical or numerical) then they narrow down their field
of vision to deal only with a sub-problem like A1. While working on A1,
they don’t want to hear anything about A2, and certainly don’t want to
know, or even consider C4. They just accept that there are twelve things
to do and start doing them. Again,
this shows up very clearly in programming; quite frequently. It is
fairly common if you read a lot of code to see a programmer belt out
very specific handling for A1 in one location, and then find another,
completely separate, yet nearly similar A2 in another. Not only does
this occur for code, but it also pops up in the data and the static data
used for operational configuration. Software, these day, is highly
redundant. It actually gets worse when you look at teams of programmers.
The large to massive code bases seem to start with 2x - 3x redundancy
and I wouldn’t even want to guess what their worst factors are. The just
become ever increasing seas of redundant data and code.Even
though polymorphism is a well-known, popular approach to architecting
code, It becomes very unlikely in most cases that a significant
proportion of most code-bases is utilizing it across the board. There
may be some sub-instances, but most often they are trivial uses. This
continues to occur, while oddly one of the most famous modern adages of
programming is the DRY principle -- don’t repeat yourself -- which came
out of The Pragmatic Programmer book (and possibly series). It’s rather
obvious that this wisdom isn’t getting followed on a regular basis quite
simply because its really really hard to accomplish in practice. If it
were being attempted more often, we’d see it dominating far more of the
technical conversations. We see it show up more often in the vast amount
of open code that exists. And a huge number of modern technologies not
only violate this principle, but they actively encourage it. So while it
is talked about, it is quite often swept under the rug in the rush to
belt out the next version.I
do find this ironic because certainly in software, we’ve seen a huge
growth in the expectations for what software can do, and in general
given the rapidly increasing -- out of control -- complexity of our
societies, you’d think more and more people would start to look for
efficiencies. But it does seem that the opposite is true about human
nature. That is, the more complex things become, the more tunneled the
vision of the people who have to deal with them. Our species seems to
retreat towards the shallows the moment things get too much for them.
This, it seems, sets up a feedback loop, forcing people who took the
long road to be late, thus forcing them to not analyze the next fork
sufficiently and again choose a longer route. And then it never ends.And
it’s that type of negative feedback loop that I’ve seen in many
software development projects, both at the coding level and the
organizational one. The projects get so ground down by slapping on
band-aids to unnecessary problems caused by redundancies, that they lack
the resources to avoid creating them in the first place. The work
degrades into a loosely coupled collection of disorganized functionality
that mitigates most of its own benefits. Not only have they chosen the
lest affective path, they’ve also opened up an ever widening sink hole
for their own resources. And it seems that this type of problems comes
directly from our own intuitive approach to solving problems. We are
most likely to not make the right choices (and then far too stubborn to
backtrack).

Thursday, February 16, 2012

I
love the idea behind open source. If I’m building something that uses
someone else’s work, being able to drop into their code, investigate,
curse, and then work around their problem is a huge time-saver. Nothing
is worse then wasting hours guessing at what weirdness lies beneath.The
problem is that this idea of ‘open’ mutated into the idea of ‘free’,
and ‘free’ is not a good idea in a society that revolves around money.
If you write some fabulous piece of code and give it away for free, not
only do you not make money yourself, but you’ve also prevented other
programmers from making money by writing something similar. Not all of
us are lucky enough to get funded by other means, some of us need to pay
mortgages and bills and such. We do this by getting paid to work. By
writing code for a paycheck. If everything is free, we’re going to have
to find some other (less agreeable) way to pay the bills. A
slightly worse problem is that as more and more stuff becomes free,
more and more of the low hanging fruit disappears. What that means in
reality is that it becomes harder and harder for programmers to go out
on their own; to start their own companies. Instead the control of the
industry shifts to the big players, who have little incentive to
innovate. If you can write something small and profitable, then you get
the freedom to experiment. If you can’t, then you’re stuck for life in a
big sweatshop writing broken code for people who don’t get computers.
I’ve definitely seen this trend in the industry over the last few
decades. The really innovative works have nearly vanished and been
replaced by more and more sloppy re-works of existing wheels. Not only
that, but the profits come from the upper levels of software, so the
lower ones get stagnant as the bugs get permanently frozen into the
code-base. Thus our software looks prettier, but because of the
complexity increase, is dropping in sophistication and quality. And it’s
not in a big companies’ interest to change this trend. They seem to
make more money with lessor quality code.So
what can we do? My first suggestion is that we should push to get more
and more software into openSource. That is an easy win, transparency
promotes quality and ease-of-use. But at the same time, we need to
attach a price to every single piece of code out there. I don’t think
home users and developers should pay, I like that they ride for free,
but for big companies -- making profits from our labors -- money
should definitely return to our community. Money that we can use to
innovate with. The
problem is that programmers aren’t business people and few of them
really want to deal with business. What they want is to build really
cool stuff and leave the hassles of collecting money to other people who
enjoy it. To allow this, I think we need to set up ‘software clearing
houses’. Programmers would deposit their code into these organizations,
and the staff there would deal with the issues of wheeling and dealing
in the business arena. The clearing house could deal with licenses,
accounts receivable and accounts payable. They would be the repository
of the running code and of the source code. They could collect bug
reports, then deal with farming them back out to the communities that
built the code. Basically they’d act as a middleman between a large
number of developers on one side, and a large number of companies on the
other.Many
of our current licenses ask for funds when the code goes into a commercial product, but not if it is being used in-house. One reason for
letting the in-house users ride for free is that not doing so would
result in a huge number of little payments that would all have to be
coordinated. That would be messy for an individual developer, but if a
clearing house represented a significant number of projects, libraries,
utilities, etc. most big companies wouldn’t have a problem paying a
single reasonable yearly lump sum amount. They’re a huge number of ways
of structuring this type of arrangement -- fine verses course payments,
etc. -- but what is really important is that it isn’t a burden to the
companies, and the money is flowing to the developers. A
significant problem with relying on many of the newer openSource
libraries is support, both for bugs and for on-going rust prevention. A
clearing house could provide some assurances that they will contact the
developers and try to get the issues sorted out. If that is unfeasible
they could also contact other unrelated developers and get the code
fixed or updated that way. Once deposited in the clearing house, the
code could live on well past the author’s interest. It would also be
less subject to dramatic shifts in design or licensing. If enough people
were interested in the preservation of a fork, the fork would find an
easy means to continue. One
problem for commercial developers is the proliferation of various
licenses for libraries. There might be a great library to use, but the
license may be vague or destructive. Often approaching the developers
directly, results in outrageous financial demands thus making it
impossible to utilize the work. Commercial developers are keen to make
profits, and aren’t against sharing them fairly, but the commodification
of software has dramatically lowered the margins. It’s getting harder
and harder to make a profit directly on software, the monies come more
often from the services and support side, particularly for software
categories like niche enterprise software (5-20 clients). Thus payments
to the authors of dependencies would be fairly small, and constitute a
considerable overhead if there were a lot of them. This again would be
fixed by clearing houses. Lump sum payments, or per-sale payments that
were sent to a single clearing house that then disperses them to a large
group of developers would allow the money to flow. If all of the works
were under the same license and the terms were reasonable, then that
would easily drop out another big problem for the commercial developers.
Another
important point is that there should be many clearing houses.
Competition is a good thing, but also some of the houses may specialized
in providing access to particular types of code. Some industries are
highly regulated and a house that could provide certified libraries
would be hugely appreciated. Also license and support features could
differ significantly, as well the underlying quality of the code. A
house that only provided vetted high-quality libraries for instance,
would be a very useful entity and save lots of development time
currently used to evaluate the existing options.I
should point out that to some degree this idea already exists. Both
Apple and Google have markets for apps that act as middlemen between the
developers and the consumers. This seems to be working quite well
(although it does also seem to have reduced the price of software). What
I think we should do is expand that basic mechanism out to all code,
all of the time. Pretty much everything would go through clearing
houses, and for everything that is usable there should be some cost to
it.There
are lots of other benefits as well, but my readers seem to really hate
it when I go on and on :-) My key point for most people is: why kill
yourself in the evenings and weekends to do great work that may end up
making other people money for decades, if you are not going to get some
share of the pie? Write something, deposit it into a couple of different
houses, and if in a few years that provides the means to retire early
then you are free to focus on the code you’ve always wanted to write,
wouldn’t that be a great thing? You’re happy, the other coders are
happy, the business’s are happy and the industry is happy. Everybody
wins.