Two common coding weaknesses

When I review code, especially code from less experienced programmers,
there are two issues that come up again and again. (Actually, with 25
years of programming, almost everyone is less experienced! Of course,
that doesn't mean I'm any good - just that I've been doing it a long
time.) These two common coding weaknesses are: naming and duplication.

Since neither of these issues necessarily affect the functionality of
the
code, cleaning them up falls under the "refactoring" umbrella. Since
they are not functional issues, why are they so important? I think
they're important because most code gets "read" and "maintained" much
more than it gets "written", so anything that improves the readability
and maintainability is important. It might seem contradictory, but I
also feel you should write code as if it is "temporary" or "transient",
since most code that you write will end up getting rewritten, often
several times. But this isn't really a contradiction. Extreme
Programming tells us "do the simplest thing that could possibly work".
In other words, plan on incrementally improving your design, a little
bit at a time. So any given line of code will likely be transient, but
the system at a whole is likely to last much longer than you might
think.

Some of the naming problems I run into:

calling the same thing by different names in different parts of the
code

calling different things by the same name in different parts of the
code (An exception to this is the use of generic names like i and n and
x. My rule is that this is okay if their use is limited to a small
area of code, so it is easy to see what they refer to.)

calling something by a misleading name e.g. calling a function
Check_Totals when it actually does the totalling and doesn't check
anything, or calling something "prompt" when it's actually the field
name, not a prompt.

calling something by an uninformative name e.g. calling a
function "Process" or "Calculate" which doesn't really tell you
anything about what it does

excessive abbreviation (e.g. rpt instead of report) Even if the
abbreviation is understandable, it's often harder to remember which
abbreviation you used, than it is to simply write out the whole word.
(e.g. was that rpt or rep or rprt ...) (Someone suggested we start
a "vowel movement".)

inconsistent naming i.e. not having a "system" for your names (e.g.
cur_amount & amount_ytd should probably be either cur_amount &
ytd_amount or amount_cur & amount_ytd) Again, it's not just the matter
of readability, but also write-ability i.e. is it easy to remember the
names.

Some of the duplication issues I run into:

putting the same code into both branches of an if-else, for example:

if (condition)
something
stuff1
else
something
stuff2

is probably better written as:

something
if (condition)
stuff1
else
stuff2

putting the same code all the places you call a function, for
example:

something
func()
...
something
func()

is probably better written by putting "something" into func()

not putting common methods into a base class (in object-oriented
code)

repeating a complicated expression instead of assigning it once to
a temporary variable

two (or more!) functions that only differ in very small ways, where
the difference could easily be made into an argument passed into a
single generic version of the function

My advice to programmers - if you find yourself cutting and pasting
code (or even worse, typing the same code) this should be a sign that
you should look for ways to avoid the duplication.

The thing to remember is that coding is an iterative process. You
won't necessarily get all these things right the first time you write
the code. In fact, trying to write code "perfectly" the first time is
a mistake, because you would be investing too much time in code that is
very likely to get thrown out later. Instead, get in the habit of
reviewing your own and each others code, and cleaning it up as you work
on it.

For a much more thorough and well thought out treatment of these and
many other issues, I highly recommend Martin Fowler's "Refactoring"
book (Addison-Wesley, 1999).

And since I can't resist the opportunity for shameless self-promotion -
check out my open source project Suneido, an integrated object-
oriented language and client-server relational
database.

The GNOME
Programming Guidelines describe the preferred way of writing code
for GNOME. This is not just a coding style standards guide; it
describes issues pertaining to consistency, maintainability,
extensibility, correctness, and security. You may find it interesting.

I'm of the opinion that most any halfway capable programmer devotes a
lot of thought to increasing the consistency of their naming strategies
and maximizing their use of functions and methods. At least I did when
I was halfway capable. And I stumbled on the once-and-only-once
ideal a long time ago and saw the power of it. These fundamentals were
somewhat overlooked with the initial hype that arose around
object-oriented programming, but they were what was important, and
finally people are writing books about it.

But I think there are more caveats to the refactoring methodology than
doing
it too early. You also don't want to do it too much. Sometimes it
is easier on the aesthetics (and logistics) to have replicated blocks of
code that call a series of methods than to combine them (and all their
possible parameter variations) into a single method. At least for
C-like code without keyword defaulted arguments. And everyone probably
knows an object design they liked less after it went through extensive
refactoring to eliminate perceived redundancies.

I would have to say that by your definition there are an awful lot of
programmers that aren't to the "halfway capable" stage. I don't mean
this in a critical or cynical way. And certainly there are a lot of
good programmers out there. And anyone who's are reading the kind of
stuff you're referring to is likely well past halfway capable. What
prompted this article was that I just finished reviewing a pile of code
from one of my programmers and it seemed like I kept running into the
same "mistakes" over and over.

I totally agree that over-generalization can be just as big a problem,
and that refactoring too early is also a mistake. But I seem to run
into these less often than simple naming and duplication issues. If
they haven't got past these simple issues, they likely won't really
understand the deeper issues.

I'm in a situation where programmers are adding/changing a mature body
of code, where many programmers of differing skill levels have come and
gone.

When adding to this mess, some of my fellow programmers absolutely love
to use boolean flags. It often happens that code in place A
determines some flow control that now has to happen in place B.
When the code was originally written, there was no conceptual
relationship between A and B, but changing requirements
have brought this relationship about.

So, add a flag! It's easy, and it works. The problem is, soon you are
trying to juggle 4 or 5 of these flags in a module ... and it doesn't
help that these flag-loving programmers aren't too good with the
consistent-naming concept either.

Sometimes, a flag is actually what you need, but in well over half these
cases, A and B can be made adjacent by a very minimal
amount of rewriting. The only reason they are apart is because there
was no requirement for them to be related before.

Is this a common problem that others have seen, or is my working
environment just special?

I've run into the flag problem as well. It seems so innocuous - until,
as you say, you have 4 or 5 of them and it becomes impossible to figure
out what's going on. Very occasionally they're the best solution, but
in most cases there is a better way. Sometimes you can use derived
classes and virtual functions, or the "State" Design Pattern (GOF), or
sometimes you need to recognize that you really should have a state
machine, or sometimes you can just redesign the code so there aren't so
many special cases.

The last time I saw flags overused was at HP. The HP-41 processor only
had four levels on its subroutine call stack. Call too deeply and you
toast the stack. Very easy to do. So you can't just create a
subroutine out of repeated code. Instead, you have to retain state in
flags. Made the code extremely hairy and tense.

Yes, you shouldn't have more than one or two flags passed into a
subroutine. More than that and your code is structured wrongly. Split
that subroutine into two subroutines calling the code that was executed
using the flags. The basic problem with flags is the same problem with
ifdefs -- it's very hard to follow the sequence of instructions if you
have to retain a lot of state.

In fact, I'd go so far as to say that ifdefs should *never* be used.
Instead, arrange your makefile so that it includes one file full of
conditional code instead of a different file. That way, you know the
semantics of the code and you don't have to worry about "If this ifdef
is turned on, then we're searching for things that aren't in the set, as
opposed to the other sense of the ifdef."
-russ

I've found flags to be necessary in the evolution of non-object-oriented
code. The need for them seems to dissipate with object-oriented code,
because specialization of generic behavior is what OOP is all about.

Something else to keep in mind, is that an outside reviewer will almost
always spot problems in the design of code that the orignal author
misses. This is just due to familiarity, The 2 different names may
have made sense at the time, (possibly were actually 2 different
concepts, but soem later revelation came up with a way to unify them),
and they know both parts of the code so intimately, that they don't
need to thnk what identifiers they used. Admittedly an experienced
programmer will catch these things more often, but the origin of
stylistic failures isn't always ignoarance of stylistic guidelines, but
often knowledge of the code preventing the programmer from seeing that
it's unreadable, as she doesn't need to read it, really.

Certainly this happens to me in my writing, I skim right over typos,
repetitions of words, etc, because I know what I meant.

My intent isn't to gloss over the importance of good style, just to
question whether teaching people what is good style, or teaching people
how to recognize bad style that snuck into their code, or teaching
people to have someone review their code more often is more important.

In the codebase I'm working with, some flags are definitely necessary.

The problem is, it's some programmers' first recourse to add a
flag, when it should really be their last. When you know
you're probably going to have to be juggling some flags, it
makes it more important not to add them gratuitously.

sej: you seem to suggest that OOD is a priori superior to other
techniques for parametrizing code, or making it polymorphic. Please have
a look at the wonderful "Multi-Paradigm DESIGN for C++" by James
Coplien. It explains in great detail analysis techniques such as
commonality and variability analysis and as well as solution domain
analysis. Preprocessor and compiler are both used to provide [compile
time,
link time, run time] polymorphism. Every such technique has its place,
depending on your exact requirements.
Take the berlin project, it is very object oriented. Yet there are a
couple of places, where we use compile time switches to map a common API
to a variety of underlying layers, such as the console (GGI, SDL, ...) .
Doug Schmidt's ACE framework is another wonderful example of a healthy
mixture of these techniques.

Programmers are the worst professional class I've ever seen for not
learning from past mistakes. I've had to give virtual (and
occasionally literal!) head-knocks to coders who consistently make the
same dumbass mistakes (dangling pointers are the most common). It's
like they're "blind" to certain flaws in their code; they continue to
use the same approaches to problems even when those approaches have
failed in the past. And they tend to brute-force problems rather than
sit back and think out a design before coding.

Ultimately, the question of "quality" is not related to the writing of
code; it is related to the behavior of the programmer himself. In my
view, this is the partially the fault of CS courses in colleges: they
train programmers to write code, not to design software. And it shows.

I have read all this message and see how it resembles talks of authors,
painters - any atrtists

You are talking about how to presentcomputer-control
language the way people will understand it at once.

I thinks this is mistake.

You need different languages to present different data!!! In
better CS courses you'll lean about 4GL languages
for data flow description. Or about Eifell :) for better and
cleaner way to design programs.But still it is mix of data!! Just remember that you still need
to write documentation to describe your project. ;-) and
just remember - are you happy to write it after program is
completed?!

What is needed:
1) Description of goals - this must be metalanguage
(English will not help you any more - it must be language
with the only interpretation and complex enough to present all goals of
humanity :). So it will be easy to convert it to
any human language (it is one way automatic conversion so...
developing THING in this language would
require programmer + writer + artists + linguists skills, but inline
help with context-depending lists of meaning(for Natural-lang
word) will help).

2) Description of types - here would be son of OOP...:)
Have a look at Yahoo
or Odigo catalogues. They are step to
all-defined data types with all-defined properties
names and types and value ranges. So all will be in same (1)
metalnaguage(UniLang?). So you will have no need to
make notes, docs, lessons etc. to tell what you mean with this names.
Also easy extensions and aging must be considered.

3) Descriptions of algorithms - here I would point out that
algorithms must be separated from API!!!
The algorithms developers are not API developers at all! They are
mathematiotions really. And.... you need a way to present
properties of algorithm (ranges of values, speed, number of iterations
for different types of data). In simple case just
table of test - wich algorithm is best for wich OS/data/delays etc... It
would be automatically choosen for goal set in (1).

5) User interfaceses - same time it would be program Help , but
compiled to English or Russian :). Also
you see - there is no difference are they User interfaces with English
help or Programm interaction interfaces with
sources.

6) compilers - Just choose better implementations of algorithm
for
given processor/architecture. With all neded tests and
list to choose from (still (1) goal will guide you here).

7) Converters - just reorganising data - ways of gating one
information from other. Also it is not converting
actually... for example you could convert DATA(image) to DATA(height)
- and get linear value, represented in any
sutable way -> Meters/Pixels -> byte/Integer/float

---------------------------
Ok it is just a knowlege base :)

Until it will be build - there would be discussions of the sort above
shows. Just representations misunderstanding...Current
programming languages(a) are not suted for human-reading. Current
docs/manuals(b) are not suted for automatic processing. And the
only compilers of (b)->(a) is human :)

The thins to solve is - to find a way of building this system so it will
be usefull from the first week of it's existence. Nevertheless
it will be redesigned and rebuid from scratch during next 5-10 years It
must save all data.

Sorry for mistakes in English ... I just one of those persuaded to
learn English to be able to learn programming.
You will never do it in other language... it's pity :-/

stefan said: you seem to suggest that OOD is a priori superior to
other techniques for parametrizing code, or making it polymorphic

I was trying to make the point that with mature C code I find myself
adding global variables to get custom behavior, where with mature C++
code I find myself adding derived classes to get custom behavior. I do
prefer the latter activity. Other than that I'm open to any kind of
parameterization that works.

There's a guy about 10 feet away from me that names his production variables things like "bud" and "weiser". Or I see a lot of "temp1",
"temp2", "temp3" in production code. Sometimes I feel like I need a dipshit neutralizer.

Another helpful thing when refactoring or throwing functionality away (e.g. just to simplify things, or to reuse the code in another project)
is
to make the compiler/linker do the work.

In other words, avoid flags which allow the programmer use use inline calculated values like foo(p, bar!=baz) or variables foo(p, bazflag).
The point here is that when cutting away what you don't need, it's really useful to rename/remove a function and see what drops away.

One feature I used to like about the Whitesmith linker it that it would list "deadwood" for you.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser
code is live. It needs further work but already handles most
markup better than the original parser.