Tuesday, January 20, 2015

Scale is a fundamental factor within organization. Let's start with a simple example.

Say
you have 3 books. If you wanted to organize them, all you need to do is
put them on a shelf. Any order is fine, since it doesn't take long for
your eyes to glance over the spines, you can find the book you want.

Now
if that set of books grows to 30, the shelf probably still works, but
you may consider reordering the books based on titles or author or even
perhaps some simple category system. 30 books is a lot to scan each time
when you need so something, so to save time finding what you want, you
might try ordering by title.

If
the number of books grows again to 300, then a whole new set of
problems is introduced. You have to abandon the shelf for a bookcase.
It's a less convenient piece of furniture that essentially takes up a
full space in a room, so it's location is probably not as handy as the
shelf. Order also becomes more significant. You may have chosen a
lexical order on the titles originally, but now with a full bookcase you
find that you really want books by the same author to be together. With
300 entries, you might have several authors with multiple titles.
Still, a sort by author makes it harder to find a specific title, so
perhaps you prepare a little list that cross-references the titles with
their appropriate shelf in the bookcase. It's a looser indexing scheme,
but it saves time later and doesn't take that long to prepare.

But
alas, the number of books keeps growing and now we get to 3,000. There
are now 10 bookcases, and because of that some of them no longer fit in
the same office. There are several locations to check and way to much
work to keep the secondary paper index up-to-date. Instead, you
recategorgize the books based on topics, and you assign a small number
of topics to each bookcase. Within the case you can sort by author or
return to the original title order. That suffices for the moment.

But
still it grows and soon enough the number hits 30,000 books. With 100
bookcases the collection now requires its own 'library' and although the
books are categorized, there are so many of them that you need multiple
indexing methods. The bookshelves have a special code, and you have
indices on title, author and a few major sub-categories. You employ a
full-time librarian to keep the categorization up to date and to
constantly roam the bookcases reordering and putting out-of-place books
back where they belong.

The
pace continues, so the number of books reaches 300,000. They've
outgrown the original library, there are now three separate locations,
each of which specializes in specific sub-categories. Within each, there
are two full time librarians, and the work to keep it all tidy and in
order has grown considerably. Finding any specific book is tougher
because there are three separate indexing schemes, so there is a global
project to consolidate those into three copies of a single larger index.
That work, plus the advancements in categorization, since there are now
a much larger number of subcategories and subsubcategories, are keeping
everyone busy.

Now
for this example, one of the key features is that the growth of books
is exponential. It keeps increasing by x10. For each one of those
up-ticks, the previous organization basically failed and an entirely new
scheme was necessary. Also at each tick the amount of work changed
radically. It started out as a personal collection and finished as a
team of 6 people. Exponential growth does this quite quickly, but any
sort of growth can approach the next tick, it just takes time. The key
to organization is that growth changes it. The larger things get, the
more effort required to keep it organized; to keep it usable. Growth
inherently increases disorganization.

This
same pattern is true in software development as well. As the code
grows, the organization of it needs to keep pace with its new size. A
small project needs very little organization while a massive project the
organization is a full time effort in itself. For everything in between
there are discrete ticks that have very different organizational
requirements. The best way to tell if things are organized is similar to
knowing if the books are organized. You examine how long it takes to
find something specific given a number of different scenarios. For books
you might look at how long it takes to find a specific book or to be
able to browse a subcategory. For code, it's how long it takes to
identify the code that produces a specific behaviour, which coincidently
is the same as being able to find and fix a bug (the behaviour is just
unwanted in the latter case). If the time has become excessive, then new
means and/or layers of organization are necessary to insure that it
doesn't become worse. In an organizational sense, if it takes a good
programmer a month to identify the code behind a given bug than either
the system is massive, it's overly complex or it's poorly organized.
Most often it is the last case.

In
software, the problem also applies to the data. It requires it's own
organization although often that can be automatically indexed via the
code and creating proper tools for the data administration. As the data
grows, new ways are found to explore it so a never ending new set of
indices is necessary as it grows. It is not enough to just lock it into
some static structure, create one ordering and forget about it. Handling
growth requires responding dynamically to each new level of scale. This
ongoing problem, and the increasing ease in which we can collect data
remain great challenges for software development and operations.

Organization
is the underlying key to making many things 'usable'. A large
collection of books has less value if you struggle to find the book you
need. The problem inherent in building up a collection is more than just
getting new books, rather it is keeping them organized in a fashion
were they retain their value. That this problem occurs twice in software
should really should be of no surprise. Code and data form the twin
axes on which software operates. Each has its own unique set of
problems, thus the organizational issues differ, but span over both. In a
rather abstract way, organizing things is the crucial science that
underpins our intelligence. We are smart not because of what we might
know, but rather because of our ability to apply that knowledge in the
world around us.

Sunday, January 11, 2015

An increasingly common way to build software is in response to users bringing in their current problems to the developers. This user-driven approach is believed by many to insure that what is being built both matches the users needs and prevents it from heading off into potentially unsuitable directions. The system gets built step-by-step as a direct 'reaction' to the users. Since by definition most users need their current problems solved right away, time is usually the single most critical issue.

Reactive development approaches have been popular for decades, mostly as an alternative to the failings of big slow long-running projects. When the scope of a project bloats heavily, the various forces involved can accidently send it off on an unreasonable course. Because the time scales were so long, any misdirection could take a long time to detect and then cost a lot to correct. The reactive idea was that a much larger number of smaller changes driven 'directly' by the users would insure that the users get exactly what they specified. In a very real sense, that's what reactive development achieves.

The problem with reactive development is that most of the control over the development is now outside of the software development team. For a simple bit of code with little architectural needs, this user-driven approach has a good chance of succeeding. A fairly verbose user with a good vision on how to really solve their own problems can articulate the interface and data required, leaving the programmers to fill in the blanks. This works so long as the bulk of the programming is primarily business related. It is however, a slow process and the resulting code is disorganized and redundant.

Generally experts in a specific user domain frame their thinking relative to their own knowledge, so they are most unlikely to see problem decomposition in the same manner that programmers have learn to as they developed their skills. Better decompositions in programming tend towards abstraction and generalization with fewer special cases, while most domain experts tend towards the opposite. They learn to be specific and focus only on one case at a time. This alternative perspective does not fit well within software's mathematical foundations, so the inevitable result is a signifcant increase in the artifical complexity of the code, driven directly by the user's specifications.

This type of decomposition problem is hardly noticable in small or even some medium sized systems, but as scope increases it begins to dominate the technical debt.

On top of that, because of the scheduling, the fastest approach to adding new features is to tack them onto the outside of the existing code. This avoids the extra step of having to understand what is already there. Continued use of this approach means that the code base loses any and all upper-level organization, becoming an increasingly large ball of mud. Constantly reacting to user needs also kills any ability to re-organize -- refactor -- the code, so once this type of development approach gets set in motion, there is generally no turning back.

At some point, if the system keeps growing in this manner, it becomes large enough and complex enough to cross a threshold where the redundency, lack of architecture, time pressures and inconsisent problem decopositions drive down the quailty so far that more time is spent patching the mess than is spent on adding new features. This is the reactive version of a death march, where the development team just marches around in circles until somebody finally pulls the plug.

Writing code that doesn't solve user problems is by definition a waste of time, but assembling odd bits of code in a user-driven manner isn't actually better. Reacting to stuff is essentially the opposite of 'engineering'. The later seeks to construct something that behaves precisely according to the builder's understanding. The former just randomly assembles stuff driven by an outside force. It lacks organization, thought and often it's full range of behaviour is undefined.

Users, by definition, are rarely engineers so they won't choose to focus on solving the necessary engineering problems that come up constantly in large development projects. They just ignore them and focus on the problems they can solve. But the solutions they need, also need to be encased in a properly engineered system. Both parts of the puzzle are absolutely necessary to avoid creating bad systems. Users are the most import source for domain specific requirements, but that's where their expertise both begins and ends. Software developers are the experts in the technical domain which includes both the technical programming and the process used to develop the system. They should known how to solve technical problems and they should also understand how to arrange large amounts of work to be completed in an effective manner. Users can't help with either problem.

Reactive approaches aren't the only way to build software, there are plenty of other ways. One that is particularly effective is to actively seek out 'solvable' user problems. In this circumstance, the technology is well-understood first, the developers are just looking for ways to apply it to help the users in their roles. Since this sort of development is driven both by the capabilities of the technology and the needs of the users it has an increased likelihood of better matching the technology to the issues.

Being proactive means that there is considerable work done first to establish a base for handling the user issues. The initial code doesn't solve problems, rather it sets up the organization necessary to be able to do that in the future. It is not unlike having to lay a foundation first in a building, so that the apartment units can be built on to something reliable later. That 'pay now' and 'receive a benefit later' quality scares a lot of people with lingering memories of the defective waterfall projects, but in this case it is very different. The old waterfall projects aimed to complete the entire system in one massive development cycle. A proactive approach on the other hand aims to construct usable Lego-like bits of technology first, so that they can be employed quickly later. Its focus is on setting the stage for reuse, without committing to a final direction.

A way that I've handled this in the past was to build up a strong base platform that deals with the necessary system requirements, such as data persistence, locking, caching, users, etc. On top of this I've added in a domain specific language (DSL) to allow the users to fine tune their own domain or business logic. The DSL essentially runs in a sandbox, so that whatever the users do, good or bad, cannot interfere with how the bulk of the system operates. This then separates out the purely technical problems from the domain ones and insures that they don't co-mingle later in unpredictable ways. The downside is that for a very long time the system is under development, but from a user's perspective it does absolutely nothing. The upside is that as the project proceeds, instead of slowing it down, it starts to speed up. Once the foundations get established, the user functionality flows quickly and if the architecture is smart it becomes increasingly easier for the users to reconfigure their logic to meet unexpected changes.

This approach can be taken very far down that road. One goal I've had in the past is to minimize the total amount of code necessary for any interface screen. Most interface code is hugely redundant. Reusing it over and over again saves a massive amount of time, but it also helps to achieve consistency within the interface. Thus it would be extremely convenient to be able to specify only the differences between screens in as few as a couple of hundred lines of code. It takes a considerable amount of thinking and some inspiration to achieve this goal, but once it has been completed it makes any additions or changes to the screens trivial.

On one project, we decided the current interface was completely wrong so we entirely rewrote it within a couple of weeks. That type of flexibility may seem excessive, but what usually happens with large interfaces is that changes are so expensive that the interface gradually bloats and gets convoluted as the work progresses. It becomes impossible for the users to navigate. Being able to avoid that fate, because it was proactively understood that it would occur, allows the design to be finessed properly as the first work continues. If some major misunderstanding occurred in the way the screens were originally structured, it is no longer an expensive problem to correct it.

In fact it is this type of flexibility that gets lost with a reactive approach. The code is built to solve very specific instances of a problem, but within most domains, most problems reoccur repeatedly just in slightly different forms. And many of these domain problems share linkages to underlying common technical ones, particularly as the scope increases. A proactive approach seeks to build up a large number of reusable prieces and then apply these to the solution which opens up the flexibility to easily rearrange them later . The cost of spending time to make the parts reusable is paid for by the savings achieved of not having the code statically welded into place.

Although a proactive approach requires more initial work, it is still a piecewise approach. That is, it can be done in a series of iterations and these can be influenced by the user requirements. It's not quite that 1:1 that defines reactive approaches, but the direction is still driven by the users. The difference is that there will be times in the development cycle where the technical or reuse requirements trump the user ones, and as such although a quick hack could be done immediately, the road travelled will be a bit longer. This of course is subject to the politics of software development and maintaining that balance is a key factor. Even in a well-run proactive development project, sometimes reacting is required to maintain confidence, although it is cleaned up immediately afterwards.

From experience, the best analogy I've found for applying a proactive approach is with Lego blocks. The idea behind the development is to continuously assemble larger and larger Lego blocks, gradually building up a collection that can solve any and all of the user issues. The blocks should be general enough that they can be used all over the place, but specific enough that the underlying problems aren't just blindly transfered to the configuration. Each block fully encapsulates a set of problems. A big project has a large number of blocks of varying sizes and these themselves need some higher level of organization. It takes a bit get out the first set of blocks, but once they exist extending their functionality gets easier. As time progresses, if the work is organized, solving new problems gets faster because the existing blocks provide a vocabulary of expression at an increasingly higher level. The blocks quite literally converge on the nouns and verbs that exist in the user's own description of their problems. That becomes convienent to check that both the development direction is correct, but also that the advanced business logic really decomposes properly on the users side.

The projects then is rooted in the low-level technical issues that build up a foundation, but gradually progresses to higher and higher level domain problems. As this grows, the capabilities of the system extend out to handle the more sophisticated issues. It's a top-down perspective that drives the bottom-up development.

Reacting to the users concedes all control to outside forces, forcing the developers to march through the work one case at a time. It is the lest effective method of building systems and is unlikely to produce quality output. The developers are just constantly chasing the ball. Getting ahead of that ball means that the developers can choose to employ smarter and more reusable approaches to their work, in anticipation of the upcoming needs of the users. That forward perspective is what is ultimately necessary to have the time to properly engineer a system. Without that, the users may get what they've asked for, but they will definitely not get what they wanted, or even what they need.