Friday, December 28, 2012

I'm typing in this post from a cottage in northern Ontario. This is my first post on my iPad, connect via 3G to the world. I couldn't imagine 20 years ago, while lugging home my 30 pound Compaq "portable", that someday I'd be miles off the grid, yet connect to the world via a wafer thin screen resting on my laptop. Hardware, it seems, has made amazing progress.

The irony is probably that my tablet is packed with retro apps, most of which are throw backs to the early days of PC computing. While hardware has screamed ahead, software has, well, lagged. Still, there are a few apps that impress me, mostly the stuff coming from Autodesk. The rest however...

Sunday, November 25, 2012

Nearly
three decades ago, when I started university all I really wanted to
learn was the magic of programming. But my course load included plenty
of mathematics and computer theory courses, as well as crazy electives.
“What does all this have to do with programming?” I often complained. At
first I just wished they’d drop the courses from the curriculum and
give me more intensive programming assignments. That’s what I thought I
needed to know. In time I realized that most of it was quite useful.Theory
is the backbone of software development work. For a lot of programming
tasks you can ignore the theory and just scratch out your own eclectic
way of handling the problem, but a strong theoretical background not
only makes the work easier it also is more likely to withstand the
rigors of the real world. Too often I’ve seen programmers roll their own
dysfunctional code to a theoretical problem without first getting a
true appreciation of the underlying knowledge. What most often happens
is that they flail away at the code, unable to get it to be stable
enough to work. If they understood the theory however, not only is the
code shorter, but they’d spend way less time banging at it. It makes it
easier. Thus for some types of programming, understanding the underlying
theory is mandatory. Yes, it’s a small minority of the time, but it’s
often the core of the system, where even littlest of problems can be
hugely time intensive.The
best known theoretical problem is the ‘halting problem’. Loosely
stated, it is impossible to write some code that can determine if some
other code will converge on an answer or run forever (however one can
write an estimation that works with a finite subset within a Turing
Machine and that seems doable). In
its native form the halting problem isn’t crossed often in practice,
but we do see it in other ways. First is that an unbounded loop could
run forever. An unbounded recursion can run forever as well. Thus in
practice we really don’t want code that is ever unbounded -- infinite
loops annoy users and waste resources -- at some point the code has to
process a finite set of discrete objects and then terminate. If that
isn’t possible, then some protective form of constraint is necessary
(although the size should be easily configurable at operational time). The
second way we see it is that we can’t always write code to understand
what code is trying to do. In an offbeat way, that limits the types of
tools we can use in automation. It would be nice for instance if we
could write something that would list out the stack for all possible
exceptions in the code with respect to input, but that would require the
lister to ‘intelligently’ understand the code enough to know the
behavior. We could approx that, but the lack of accuracy might negate
the value of the tool.Another
interesting theoretical problem is the Two Generals Problem. This is
really just a coordination issue between any two independent entities
(computers, threads, processes, etc.). There is no known way to
reliability get 100% communication if the entities are independent. You
can reduce the window of problems down to a tiny number of instructions,
but you can never remove it entirely. With modern computers we can do
billions of things within fractions of a second, so even a tiny 2 ms
window could result in bugs occurring monthly in a system with a massive
number of transactions. Thus what seems like an unlikely occurrence can
often turn into a recurring nuisance that irritates everyone.Locking
is closely related to the Two Generals Problem. I’ve seen more bugs in
locking than in any other area of modern programming (dangling pointers
in C were extremely common in the mid 90s but modern languages mitigated
that). It’s not hard to write code to lock resources, but it is very
easy to get it wrong. At its heart, it really falls back to a simple
principle: to get reliable locking you need a ‘test-and-set’ primitive.
That is, in one single uninterrupted single-threaded protected
operation, you need to test a variable and set it to ‘taken’ or return
that is it unavailable. Once you have that primitive, you can build all
other locking mechanisms on top of it. If it’s not atomic however, there
will always be a window of failure. That links back to the Two Generals
Problem quite nicely, since where it becomes an issue is when you can’t
have access to an atomic ‘test-and-set’ primitive (and thus there will
always be problems).Parsing
is one of those areas where people often tread carelessly without a
theoretical background, and it always ends badly. If you understand the
theory and have read works like The Red Dragon Book then belting out a
parser is basically a time problem. You just decide what the ‘language’
requires such as LR(1), and how big the language is and then you do the
appropriate work, which more often than not is either a recursive
descent parser or a table driven one (using tools like lex/yacc or
antlr). There are messy bits of course, particularly if you are trying
to draft your own new language, but the space is well explored and well
documented. In practice however what you see is a lot of crude
split/join based top-down disasters, with the occasional regular
expression disaster thrown in for fun. Both of those techniques can work
with really simple grammars, but then fail miserably when applied to
more complex ones. Thus being able to parse a CSV file, does mean you
know how to parse something more complex. Bad parsing usually is a huge
time sink, and if it’s way off then the only reasonable option is to
rewrite it properly. Sometimes it’s just not fixable.One
of my favorite theoretical problems is the rather well-known P vs NP
problem. While the verdict is still outstanding on the relationship, it
has a huge implication for code optimizations. For people unfamiliar
with ‘complexity’, it is really a question of growth. If you have an
algorithm that takes 3 seconds to run with 200 inputs, what happens when
you give it 400 inputs? With a simple linear algorithm it takes 6
seconds to run. Some algorithms perform worse, so they may take 9 secs
(3^2 -- three squared) to run, or even 64 seconds (4^3 -- four to the
power of three). We can take any algorithm and calculate its
‘computational complexity’ which will tell us exactly how the time grows
with respect to the size of the input. We usually categorize this by
the dominant operators so O(1) is a constant growth, O(n) is growing
linearly by the size of the input, O(n^c) is growing by a constant
exponent (polynomial time) and O(c^n) has the size of the input as the
exponent (exponential time). The P in the equation is a reference to
polynomial time, while NP is rather loosely any growth such as
exponential that is larger (I know, that is a gross oversimplification
of NP, but it serves well enough to explain that it references problems
that are larger, without getting into what constrains NP itself). Growth
is a really important factor when it comes to designing systems that
run efficiently. Ultimately what we’d like is to build is a well-behaved
system that runs in testing on a subset of the data, and then to know
when it goes into production that the performance characteristics have
not changed. The system shouldn’t suddenly grind to a halt when it is
being accessed by a real number of users, with a real amount of data.
What we’ve learned over the years is that it is really easy to write
code where this will happen, so often to get the big industrial stuff
working, we have to spend a significant amount of time optimizing the
code to perform properly. The work a system has to do is fixed, so the
best we can do is find approaches to preserve and reuse the work
(memoization) as much as possible. Optimizing code, after its been shown
to work, is often crucial to achieving the requirements.What
P != NP is really saying in practice is that there is a very strong
bound on just exactly how optimized the code can really be. If it’s not
true then there would be no possible way you could take an exponential
problem and find clever tricks to get it to run in polynomial time. You
can always optimize code, but there might be a physical bound on exactly
how fast you can get it. A lot of this work was best explored with
respect to sorting and searching, but for large systems it is essential
to really understand it if you are going to get good results.if
it were true however, amongst many other implications, that would mean
that we are able to calculate some pretty incredible stuff. Moore’s law
has always been giving us more hardware to play with, but users have
kept pace and are continually asking for processing beyond our current
limits. Without that fixed boundary as a limitation, we could write
systems that make our modern behemoth's look crude and flaky, and it
would require a tiny fraction of the huge effort we put in right now to
build them (also it would take a lot of fun out of mathematics according
to Gödel). Memoization
as a technique is best known from ‘caching’. Somewhere along the way,
caching became the over-popular silver bullet for all performance
problems. Caching in essence is simple, but there is significant more
depth there than most people realize, and as such it is not uncommon to
see systems that are deploying erratic caching to harmful effect.
Instead of magically fixing the performance problems, they manage to
make them worse and provide a slew of inconsistencies in the results. So
you get really stale data, or a collection of data with parts out of
sync, slower performance, rampant memory leaks, or just sudden scary
freezes in the code that seem unexplainable. Caching, like memory
management, threads and pointers is one of those places where ignoring
the underlying known concepts is most likely to result in pain, rather
than a successful piece of code.I’m
sure there are plenty of other examples. Often when I split programming
between ‘systems programming’ and ‘applications programming’ what I am
really referring too is that the systems variety requires a decent
understanding of the underlying theories. Applications programming needs
an understanding of the domain problems, but they can often be
documented and passed on to the programmer. For the systems work, the
programmer has to really understand what they are writing, for if they
don’t, the chances of just randomly striking it lucky and getting the
code to work are are nearly infinitesimal. Thus, as I found out over the
years, all of those early theory courses that they made me take are
actually crucial to being able to build big industrial strength systems.
You can always build on someone else’s knowledge, which is fine, but if
you dare tread into any deep work, then you need to take it very
seriously and do the appropriate homework. I’ve seen a lot of
programmers fail to grok that and suffer horribly for their hubris.

Sunday, November 18, 2012

One
significant problem in software development is not being able to end an
argument by pointing to an official reference. Veteran developers
acquire considerable knowledge about ‘best practices’ in their careers,
but there is no authoritative source for all of this learning. There is
no way to know whether a style, technique, approach, algorithm, etc. is
well-known, or just a quirk of a very small number of programmers.I
have heard a wide range of different things referred to as best
practices, so it’s not unusual to have someone claim that their eclectic
practice is more widely adapted than it is. In a sense there is no
‘normal’ in programming, there is such a wide diversification of
knowledge and approaches, but there are clearly ways of working that
consistently produce better results. Over time we should be converging
on a stronger understanding, rather than just continually retrying every
possible permutation. Our
not having a standard base of knowledge makes it easier for people from
outside the industry to make “claims” of understanding how to develop
software. If for instance you can’t point to a reference that says there
should be separate development, test and production environments, then
it is really hard to talk people out of just using one environment and
hacking at it directly. A newbie manager can easily dismiss 3
environments as being too costly and there is no way to convince them
otherwise. No doubt it is possible get do everything all on the same
machine, it’s just that the chaos is going to extract a serious toll in
time and quality, but to people unfamiliar with software development
issues like ‘quality’ find that they are not easily digestible.Another
example is that I’ve seen essentials like source code control set up in
all manner of weird arrangements, yet most of these variations provide
‘less’ support than the technology can really offer. A well-organized
repository not only helps synchronise multiple people, but it also
provides insurance for existing releases. Replicating a bug in
development is a huge step in being able to fix it, and basing that work
on the certainty that the source code is identical between the
different environments is crucial.Schemas
in relational databases are another classic area where people easily
and often deviate from reasonable usage, and either claim their missteps
as known or dismiss the idea that there is only a small window of
reasonable ways to set up databases. If you use an RDBMS correctly it is
a strong, stable technology. If you don’t, then it becomes a black hole
of problems. A normalized schema is easily sharable between different
systems, while a quirky one is implicitly tied to a very specific code
base. It makes little sense to utilize a sharable resource in a way that
isn’t sharable. Documentation
and design are two other areas where people often have very eclectic
practices. Given the increasing time-pressures of the industry, there is
a wide range of approaches happening out there that swing from ‘none’
to ‘way over the top’, with a lot of developers believing that one
extreme or the other is best. Neither too much or too little
documentation serves the development, and often documentation isn’t
really the end-product, but just necessary steps in a long chain of work
that eventually culminates in a version of the system. A complete lack
of design is a reliable way to create a ball of mud, but overdoing it
can burn resources and lead to serious over-engineering. Extreme
positions are common elsewhere in software as well. I’ve always figured
that in their zeal to over-simplify, many people have settled on their
own unique minimal subset of black and white rules, but often the
underlying problems are really trade-offs that require subtle balancing
instead. I’ll often see people crediting K.I.S.S (keep it simple stupid)
as the basis for some over-the-top complexity that is clearly
counter-productive. They become so focused on simplifying some small
aspect of the problem that they lose sight that they’ve made everything
else worse.Since
I’ve moved around a lot I’ve encountered a great variety of good and
insane opinions about software development. I think it would be helpful
if we could consolidate the best of the good ones into some single point
of reference. A book would be best, but a wiki might serve better. One
single point of reference that can be quoted as needed. No doubt there
will be some contradictions, but we should be able to categorize the
different practices by family and history. We
do have to be concerned that software development is often hostage to
what amounts to pop culture these days. New “trendy” ideas get injected,
and it often takes time before people realize that they are essentially
defective. My favorite example was Hungarian notation, which has
hopefully vanished from most work by now. We need to distinguish between
established best practices and upcoming ‘popular’ practices. The former
have been around for a long time and have earned their respect. The
latter may make it to ‘best’ someday, but they’re still so young that it
is really hard to tell yet (and I think more of these new practices are
deemed ineffective then promoted to ‘best’ status). What
would definitely help in software development is to be able to sit down
with management or rogue programmers and be able to stop a wayward
discussion early with a statement like “storing all of the fields in the
database as text blobs is not considered by X to be a best practice...,
so we’re not going to continue doing it that way”. With that ability,
we’d at least be able to look at a code base or an existing project and
get some idea of conformity. I would not expect everyone to build things
the same way, but rather this would show up projects that deviated way
too far to the extremes (and because of that are very likely to fail).
After decades, I think it’s time to bring more of what we know together
into a usable reference.

Sunday, November 4, 2012

I’ve always loved this quote:"Work Smarter...Not Harder"Allan F. MogensenBut what does it really mean, particularly when it comes to software development?

get organized immediately and cleanup often

being consistent is more important than being right

be very sensitive to scale (bigger systems mean less shortcuts)

utilize the full abilities of any technology or tools, but don’t abuse them

automate as much as possible, spend the time to get it reliable

minimize dependencies, accept them only when there are really no other options

read the manual first, even if it is boring

research before you write code, avoid flailing at a problem, seek expertise

choose to refactor first, before coding

delete as much code as possible

encapsulate the subproblems away from the system, spend the time to get it right

break the problem cleanly, fear special cases

apply abstraction and generalization to further reduce the work

think very hard about the consequences before diving in

fail, but admit it and learn from it

don’t be afraid of the domain, learn as much as possible about it

focus on the details, quality is all about getting them right

accept that complexity breeds below the surface, if you think something is simple then you probably don’t understand it yet

know the difference between knowing, assuming and guessing

everything matters, nothing should be random in software

don’t ignore Murphy’s law

a small number of disorganized things doesn’t appear disorganized until it grows larger

reassess the organization as things grow, updating it frequently as needed

Working
smarter is most often about spending more time to think your way
through the problems first, before diving in. Under intense time
pressure, people often rush to action. There are times when this is
necessary, but there is always a cost. Rack up enough of this technical
debt. and then just servicing it becomes the main priority, which only
amplifies the time pressure. Thus any gains from swift action are lost
and working harder won’t undo the downward spiral. Being
smart however can prevent this from occurring. Yes, the pace is slower
and getting the details right always requires more effort, but a minimal
technical debt. means more resources are available to move forward.
Eventually it pays off. Being smart isn’t just about thinking hard, it
also requires having enough data to insure that the thinking is
accurate. Thus, acquiring more information -- dealing with the details
-- is the key to being able to amplify one’s thinking. In software
development, what you don’t understand can really harm you and it’s very
rare that you can ignore it.

Saturday, October 27, 2012

While
it is not always true, you’ll frequently find that in restaurants there
is a reasonable correlation between the quality of the food and the
management of the kitchen. In my youth I had always found that a messy
kitchen often meant low quality food and/or bad service. Of course it’s
not always that simple, but usually a disaster area behind the curtains
percolated out to the rest of the organization. Since I prefer to ‘do
things well if I have to do them’ when searching for jobs back then, I
stayed far away from disorganized, sloppy or dirty restaurants, even if
it was well-hidden behind the scene.The
same is essentially true in software development. Projects that are set
up correctly have strong processes, gravitate towards best practices
and control the client’s expectations are considerably more likely to
produce quality software. The correlation isn’t always as strong though,
since I have seen smooth development shops that still produce fugly
software. But I’ve never seen a badly organized shop produce excellence.
One
far too common problem I’ve seen is that the client expectations are
out of whack. In the printing industry people often say “Fast, Cheap or
Quality, pick two” as the means to explain the types of trade-offs
necessary when purchasing print work. In software we get the same saying
but it ends with “pick one”. I figure it’s worse for our industry
because our work is considerably longer. Print jobs are often measured
in hours, days or perhaps weeks, while software is usually measured in
weeks, months and years. I’ve seen far too many software projects where
clients insist on all three and then have a hard time understanding why
some “fiddling” with a few screens might require more than a trivial
effort. A
development project that is set up well not only attends to the
implementation and distribution of software, but it also must control
the client’s expectations. They need to be reasonable. Failure to do
this means too much pressure on features and quick fixes, which means
accumulating a heavy technical debt. Left unchecked, the project spirals
downwards until it combusts. You
can see the symptoms of that type of pressure directly in the code
base. Poor analysis, little design, poor use of tools and overly
frequent ‘micro’ releases. As well, there is generally a high bug count
and a lot of time burned on ‘second-line support’. If developers are
busy with band-aides then clearly they’re not coding new stuff. Too much
support and too little features just amplifies the pressure, which
eventually becomes so toxic that big things start to break.Getting
out of this type of spiral can be difficult or impossible. Resetting
the expectations of people who are already disappointed is an extremely
tough task. However, just like those reality TV shows where they help
people in serious financial debt learn about handling their finances
(like Til Debt Do U$ Part) in a more reasonable manner, the ticket to
really fixing the development problems is to tackle the technical debt,
which mostly means being cost conscious (features & make-work) and
reducing lifestyle expectations (release dates). Digging out takes a
massive amount of time and means a lot less progress, but it only gets
worse if it is put off for longer. Reversing
the slide means revisiting the development and release procedures. It
also means lots of cleanup and refactoring. And it usually means
revisiting all of the development tools to upgrade their usage to known
best practices. For confidence, a few low hanging features might be
tossed in as well, but all major changes need to be put on hold until
the implementation and release process stabilizes. A smooth development
effort should produce minimal support effort. It should be reliable and
repeatable. The code should be clean and consistent.Getting
back to the kitchen in a messed up restaurant, not only do you need to
clean up the work area and replace any defective equipment, but you also
need to set some processes in place to insure that it doesn’t happen
again. Once that back-room problem has been dealt with, you can consider a
new strategy and/or a new menu, but neither of those will help if the
kitchen lacks the ability to execute.You’d
think that new projects were less vulnerable to this type of problem,
but most often the pressure to get something working right away
overrides the desire to set things up properly. The developers start
letting little issues slip and then after enough time, they get hit with
‘death by a thousand cuts’. The most visible “source” of the problem is
the client expectations, but they only become overwhelming because of a
lack of management. Still it is ultimately up to the developers to
‘insist’ that the technical debt not be ignored. At some point it will
always be dangerously large, so at very least ‘cleanup’ should be a
recurring work item. Personally I like to start each new development
iteration with some significant cleanup, mostly because its boring and
it gives one a chance to take a breather after the last release. If that
becomes a steady habit, while there is always technical debt and time
pressure, at least it never builds up to toxic levels. In
the end its not just what you code, but how you code it and how you
leverage the development and systems tools as well. Digging out of
technical debt is an on-going problem and mostly the fixed time
allocated to do the work should be relative to the size of the debt. It doesn’t need to
happen in one big effort, but it can’t be ignored indefinitely either. A
smooth development process means that the developers can spend more
time contemplating the behavior of the code, which is really the only
route to quality. A disorganized process means a strong head wind which
is only overcome by dangerous short-cuts. Thus smooth is good and it is a
primary requirement to achieving quality. You can’t have the later
without spending time to get the former.

Monday, October 15, 2012

Does it really matter what the code looks like? The short answer is an emphatic ‘yes’. It
matters because ultimately writing code is about discipline and
details. If the code is just slapped together, really messy and there
are issues like extra blank lines or extra variables laying all over, it
reveals an awful lot about the author. Perhaps they were in a huge rush
or even a panic. Maybe they are just lazy or don’t care. Maybe they
really don’t understand what they are doing. Any which way, their
state-of-mind while doing the work was such that they lacked the
self-discipline to clean up their work and thus were probably not
particularly good with playing attention to the all-important details
either. Messy code is almost always fragile and usually a bug fest.Code
is basically knowledge and intelligence encoded into a sequence of
steps for the computer to execute. It is only as smart as the programmer
that wrote it, and it is only as flexible as they conceived of the
runtime environment. If the programmer’s style if haphazard or their
thinking muddy, then the code will suffer the same faults. If however,
the programmer’s understanding is crystal clear and they grok the
environment, then the code will stand up to a lot and keep on working. Just
getting something to barely work for a extremely limited set of
circumstances is not good enough to be considered professional quality.
Working code has to work correctly for all circumstances; for all
inputs; for all changes in the environment. It needs to be able to stand
up to the full range of things that happen in the real world. If it
doesn’t, that’s an indication of a lack of quality.What
experienced professional programmers always come to understand is that
quality starts in the code. If you want a beautiful system that is a joy
for the users, then the code underneath needs to not only be smart and
well designed, but it also needs to look marvelous as well. The
‘platinum’ standard of coding comes from taking a complex problem and
making the code look simple. Elegance comes from both the readability
and the underlying skill used to make the answer look obvious. A
well-written piece of code hides the messiness of the details carefully.
It misleads a reader into thinking it took just a fraction of the time
to get written. It leverages abstraction to avoid redundancies, but not
at the cost of becoming obfuscated. It basically takes the knowledge of
the programmer and encapsulates that into a small, tight region of the
system that works correctly and makes it easy to extend later. It draws
clean, hard lines through the problem and then ties them up with the
lines in the solution. And because it is easy to see its sole purpose,
it makes it really easy to detect and fix bugs. Quality software comes
from quality code.But
it is not just quality that is affected by bad code. Testing, bug fixes
and support are all heavy drains on software development. Work spent in
these areas is work that is no longer available for new development.
Crappy-code bugs are way harder to find and fix, so as a consequence
they chew through more resources. It actually costs more to write bad
code than doing a good job the first time around, line for line. And
often projects get caught up in a vicious cycle of coding fast and
recklessly then loosing too much time dealing with the inevitable
problems, causing them to go right back to fast and reckless again. That
type of cycle feeds back into morale, and the whole project does a slow
spiral down the drain. As
for programming style, mostly it doesn’t really matter which
conventions are chosen. Some conventions do make improvements in clarity
or readability, and help to make inconsistencies more obvious, but so
long as all of the code is consistent, it can always be refactored to a
better convention at some point. Getting
all of the variable names the same, laying out the lines with the same
whitespace formatting and finding strong conventions for naming core
parts like methods and objects establishes a strong foundation for the
code. Normalization and abstraction help to either hide the details or
put them forth in a way that inconsistencies are noticed easily. A solid
architecture decomposes the system in a way that makes it easy to trace
bugs quickly back to isolated parts of the code base. All of these
things help in reducing the number of bugs, and also reducing the time
it takes to fix them. And because of them, it opens up more time and
resources for new development. Building
a high quality software system always means playing close attention to
the millions of details floating about the project. It is not enough to
just hack through the work, it is a detail-oriented process, which often
needs great time and patience. All development projects come with
outrageous demands and too little time, but being sloppy doesn’t help
with these issues. Part of the great skill involved in being a
professional programmer is the self-discipline to do the right thing,
even under intense pressure. Sloppiness and short-cuts, while tempting
are just putting off today, what will always cost more in the future. If
the goal is quality, then that has to percolate into every tiny detail.
Fast and cheap is always crap in software.

Wednesday, October 10, 2012

To
me ‘intelligence’ is basically raw human thinking power. It’s our
ability to work through problems. In order to employ intelligence
effectively in our world, we need data. And that data needs to be
structured and interconnected to make it usable to us. Usable, organized
data is what I take to be ‘knowledge’. Its all ready to be utilized.Our
various endeavors are categorized into a huge number of fields (or
domains) such as law, medicine, finance, physics, math, biology, etc.
each of which is mostly an independent ‘base’ of knowledge about the
field. What’s really interesting about most knowledge bases is that they
are easy to over-simplify and also frequently counter-intuitive in
their depths. An outside perspective can easily lead to the wrong
conclusions. It’s not until you are steeped in the details, that you’re
able to apply your understanding correctly. Without
enough underlying knowledge, intelligence is just hand waving. You can
think about something as deep and as long as you want, but your
conclusions are unlikely to be reasonable.Software
development is extremely complex. Not only do you need a strong
understanding of the underlying technologies and how to use them, but it
also cuts across many other knowledge bases. A
software system is a solution to a problem in a user domain. If it
doesn’t solve their problems, then it is only adding to them. It
consists of a huge number of details, all of which need to be organized
and fit into the final work. Managing these pieces, usually under tight time-lines requires a special set of skills. Software is also very
expensive to build and maintain, so finding the money to pay for enough
resources is always tricky. Putting this together we see that software spans the following knowledge bases:

Technology

User domains (all?)

Management

Business

Most
programmers tend towards believing that technology is the most
significant issue in software development, but usually the problems
start in other areas and feed back into the development work. For this
post, I’ll go through all four areas in the order that I find have the
most impact for big projects.

Management

A
large software project has millions of details that are all being
juggled at the same time. Keeping most of those balls in the air,
without losing track of them, is necessary to deliver a working system. A
good manager is constantly running around, making sure that all of
their resources are moving forward, and that any roadblocks are
eliminated as swiftly as possible. Some people see software management
as a higher creative input role where they just have to inject ideas,
but that’s never actually helpful. Most development projects have more
than enough ideas, what they need is organization, process and to keep
everyone on the same page. In that sense a good manager accepts their
role as herding all of the cats in the same direction, while making sure
the path forward is clear.Development
meetings are a good place to assess management. If they are
retrospective and basically just a forum for people to catch up with
what has already happened, then the project is not well-organized. It
shouldn’t be necessary for the manager to catch up, if they have been
paying attention as they should. If,
however, meetings are really planning sessions for the next upcoming
work, then they contribute by identifying the issues long before they
become significant enough to derail the process. Plans don’t always work
as expected, but planning is a necessity when you are dealing with long
running work. Even small tasks in development can take weeks or months,
and the worst possible outcome is to keep shifting directions before
any of the work is completed properly. What’s started should be
finished, or it shouldn’t have been started in the first place. Only a
long-term plan will avoid time-crushing ‘make-work’. Since
development flows through different phases, the management of each has
to change as well. A basic development iteration has five parts:

Analysis

Design

Implementation

Testing

Distribution

Analysis
for example is a never-ending effort about depth. Wherever you skimp on analyzing the requirements, it will come back to haunt you as ‘scope
creep’. However it would be rare to actually have enough time and
patience to fully analyze everything before the work starts. Analysis is
also very sensitive to the approach; ask the right questions and you
get the right answers. Ask the wrong ones and you get misleading
information. Managing analysis means ensuring that both the right
questions are being asked, and that the answers are being organized.
Experienced analysts need little intervention, but the quality of the
work isn’t known until later in the implementation phase, when it is
often too late.Design
and Implementation are really about making sure all of the programmers
are on the same page; that they are working together well and that they
have all of the input necessary to keep them moving forward. For large
projects this is basically about team building. A system built by a
dysfunctional group is naturally a big ball of mud, and the messiness of
the work creates a huge of amount of extra make-work to keep it all
from falling apart. A good manager needs to keep the team working,
arbitrate disputes and enforce the process and standards even when the
time pressure is intense. They also need to protect the programmers from
any outside interference to insure that they have enough calmness to be
able to think clearly and deeply about their work.Testing
is all about quality vs. time trade-offs. It is impossible to test to
100%, so there is some lower percentage of testing that has to be
accepted. Often that might even be just a fixed block of time. The
biggest problem during testing is to keep everyone working. Testing is
often seen as boring -- particularly after an intense development slog
-- and the developers start to lose their attention to the details. Bugs
get noticed, but then forgotten. Managing here means dragging people
around, holding hands, calming nerves and just trying to get the process
as constructive as possible. It can also means having to make brutal
choices about quality vs. timeliness. Sometimes the software has to be
shipped, even if it is not as good as it should have been.Distribution
of software usually involves an operations dept., or selling the
system. Because it is usually forgotten until the end, the distribution
is rarely well-planned. Management here needs to insure that the
software is supported well enough, but not at the cost of disabling
future development work. The handling of feedback and bug reports can
all be set up in advance, and most projects require both a standard
release and an emergency fast-track process. Most issues take some time,
but some need an ‘all hands on deck’ approach in order to deal with
them before the situation escalates.

Business

A
large software project takes many man-years to build and for most
projects the work is never done. It just goes on, year after year.
Hiring programmers isn’t cheap and you need a lot of other support staff
in the project as well including: project manager, system admin,
specialists, etc. Commercial quality systems also need roles like:
graphic designer, UX expert, editor, translator, etc. All
together, to build something large that isn’t a weird eclectic mishmash
of disorganized functionality means having to shell out a huge amount
of money. For software, money is time. That is, if you have enough
money, you can hire the necessary resources and experience to get the
work done, given some reasonable time frame. A lack of money, means that
you have to do more with less, and often times it’s that time pressure
that leads people to take ill-advised shortcuts, which will eventually
scramble the project so badly it can’t be saved. As
such, setting up a development project always means having to figure
out where the money is going to come from. And as is always the case
with money, what strings are attached to it. Nothing comes for free.The
business world is very subtle, complex and constantly changing. Most
techies who encounter it greatly oversimplify its nature, shades of gray
and depth, which generally leads to unrealistic expectations of how
things ‘should’ work. Some people have a better intuitive feel than
others, but for most people it is best to just write the whole domain
off as ‘irrational’ so that they won’t make any false assumptions. Because
it is volatile, the business world inflicts a lot of short-term
influence on software development. This conflicts with the long-term
nature of the work, so it requires making a large number of very
difficult trade-offs. If the work always bends to the short-term
concerns it will quickly become a mess. If it always sticks to the
long-term, it will likely lose confidence and funding. Thus it needs to
perform a very difficult balancing act between the two, so a good leader
that finds the right trade-offs will end up taking flak from both
sides. If both sides are a little unhappy, then it’s likely balanced
properly.For
most people, the ability to balance the business and technical
requirements is learned from long, hard and brutal experience. Intuition
is usually bias to one side or the other. Even the greatest
intelligence can only ‘guess’ its way through the complex interactions,
getting more wrong than right. So all that is left is learning from
experience; from the successes and mistakes of the past. And it takes
some painful introspection to really be objective about the many causes
of failure. People like to blame others, but within the spidery web of
development, all things are related and no one is immune from influence.

User Domain

Software
starts with a problem that needs to be solved. That problem is always
domain specific, such as a financial system, inventory or even social
networking, etc. Most domains have their own unique set of terminology,
usually steeped in history. They often have ‘rules of thumb’ or dirty
little secrets lurking in their darkened corners. What they never are,
is laid out in a nice rational manner all ready to be modeled in
software. Often the data is poorly understood, the process disorganized
and each organization within the domain is slightly different. There are
always rules, but they are not always followed rigorously, nor
particularly logical. Thus
the core problem in software is taking some ill-defined ‘informal
system’ and finding a reasonable mapping to a very rigid formal system modeled inside of a computer. In practice this makes the analysis of an
existing domain one of the most crucial parts in arriving at a
functional software system. However, because it is always gray and
messy, it is also the part of the process that people ignore the most
often. They just start coding, and hope that somehow, after a while, the
answers will come to them. Sometimes that works, but more often getting
off on the wrong foot is extremely fatal. If you build too far from the
actual answer, then you’ll have no choice but to do it all over again.
But since people don’t like to restart their efforts they usually flail
at the code, hoping it will somehow work itself out. However if they
started really badly, they could bang on it forever and still never get
something that works. A
very common problem in development is that the users rarely know what
they want, or what would work correctly for their problems. Some have
strong opinions, but they often lack a full understanding of the
consequences of their choices. Most flip-flop faster than the
development can be completed, so tying the process too closely to their
wishes frequently results in a mess of half finished, or poorly thought
out code. However the converse is also true, code written in an “ivory
tower” far away from the users most often oversimplifies the core
problems making it somewhat less than helpful. To bridge this gap
requires domain experts, and often a very deep understanding of the real
problems. Everything the users say is important, but not all of it
needs to be taken literally. They usually understand their own informal
systems, but the mapping to software, and the code itself are best left
to people with plenty of experience. It is easy to write code, but
extremely difficult to know what code is right for the solution.

Technology

Last,
but not least are the technologies being used. Most projects involve
anywhere between 3 and 30 major technologies, and a slew of minor ones.
Each technology is eclectic in its own unique way and needs some time
and experience to discover its strengths and weaknesses. People often
focus on the programming language when assessing skills, but in most
projects crafting conditionals, loops and slicing & dicing code are
the easy parts. There is skill involved in solving the endless array of
coding puzzles, but unless the project is pushing the bleeding edge of
technology, it becomes significantly easier as one gains experience.
Unfortunately, getting heavily seasoned (>15 yrs) programmers is
difficult, since they are expensive and often leave the industry. Software
development is hugely affected by scale and by the desired quality of
the final work. For scale I usually break it down as:

small - 1 developer, <30 lines="lines" span="span">30>

medium - 1-4 developers, <99 lines="lines" span="span">99>

large - 5-20 developers, <1 lines="lines" span="span">1>

huge - teams of developers, millions of lines

Projects
don’t jump scale easily, they often require a nearly complete rewrite
to go from one size to another. Disorganization and bad practices often
work OK for small and medium projects, but they become fatal beyond
that. As well as scale, the desired quality is important. I generally see it as:

Each
increment in quality takes considerably more work and requires
specialists to focus on their particular strengths. Products often sell
although they are just in-house quality, but then they are vulnerable to
stronger competitors. Users may not complain directly about
inconsistencies, but they do generate a negative impression of the
system. Badly architected ‘balls of mud’ can often degenerate in quality
as programmers randomly slap weak functionality into the corners. Poor
development practices tend to be reflected in the interface, so often
the outside of the system is a good indication of the state of the code
base.A
rather dangerous development trade-off often comes in the choice to
build or buy (paid or free). Depending on the technical specs, building
is often extremely time-consuming, but in my career I’ve seen more
people fail because of their choice to buy. All technologies are a
collection of their author’s eccentricities, and often these play a
dominate role in their usage. Buying specialty libraries and packages
usually works well because you don’t have to acquire the knowledge to
build them, but for the core parts of the system,if you depend on
someone else’s solution you limit your options going forward. That can
drive the architecture and constrain the path forward in dangerous ways.
Getting
a commercial grade large or medium system to users is always a team
exercise. It takes a large group of people, all with different specialties, to make it happen. As such, the team dynamics become
crucial in determining the outcomes. Badly functioning teams usually
produce badly functioning systems. Rogue programmers may get a lot of
opportunity to express their creativity in their work, but they often do
it at the expense of the overall project. Multiple different coding
styles and no consistency generates high bug counts and stability
problems. Once a project gets going it only continues to work if all of
the people involved are on the same page, which generally means strong
leadership, a reasonable process and a well-defined set of objectives.
Efficiency means long-term goals, even if the short-term is volatile.
Initial speed can be gained from brute force hacks, little or no reuse
and heavily siloed programmers, but each of these accrues significant
technical debt, which eventually bogs the project down into a mess. If
the work is getting harder over time that often means disorganization,
no architecture, redundancies and/or no long term plan, which if left
unchecked will only get worse. A well run, well thought-out development
effort will build up a considerable number of reusable common
components, which if documented will guide extensions and new features.
Architecture sets this commonality and management enforces its usage.

Finally

What
sets software apart from most other professions is that it requires a
larger cross-section of other knowledge bases to keep it successful. The
danger in software is that from an outside perspective, it all looks so
easy. You just have to throw together a bunch of simple instructions
for the computer and chuck it all onto a server. But that type of
over-simplification has always lead to disasters, caused by people who
don’t have enough experience to respect the underlying complexity and
trade-offs required for successful development. Most other professions
are usually managed by people who have moved through the ranks. Software
is often special, in that the most experienced developers are rarely
put into full leadership positions. More often it is business people or
domain experts that attempt to drive the projects forward, although few
have the necessary prerequisites. Lack of knowledge usually equates to
bad choices, which always mean more work and a much higher likelihood of
failure. The complexities, knowledge and work involved in software
development are easy to underestimate, so the failure rate is obscenely
high.