“It's not about how to achieve your dreams, it's about how to lead your life, ... If you lead your life the right way, the karma will take care of itself, the dreams will come to you.”
― Randy Pausch, The Last Lecture

Tuesday, September 25, 2012

This is a daunting task indeed, and there's a lot of ground to
cover. So I'm humbly suggesting this as somewhat comprehensive guide
for your team, with pointers to appropriate tools and educational
material.Remember: These are guidelines, and that as such are meant to
adopted, adapted, or dropped based on circumstances.Beware: Dumping all this on a team at once would most likely
fail. You should try to cherry-pick elements that would give you the
best bang-for-sweat, and introduce them slowly, one at a time.Note: not all of this applies directly to Visual Programming
Systems like G2. For more specific details on how to deal with these,
see the Addendum section at the end.

Executive Summary for the Impatient

Define a rigid project structure, with:

project templates,

coding conventions,

familiar build systems,

and sets of usage guidelines for your infrastructure and tools.

Install a good SCM and make sure they know how to use it.

Point them to good IDEs for their technology, and make sure they know how to use them.

Couple the build system to continuous integration and continuous inspection systems.

With the help of the above, identify code quality "hotspots" and refactor.

Now for the long version... Caution, brace yourselves!

Rigidity is (Often) Good

This is a controversial opinion, as rigidity is often seen as a force
working against you. It's true for some phases of some projects. But
once you see it as a structural support, a framework that takes away
the guesswork, it greatly reduces the amount of wasted time and
effort. Make it work for you, not against you.Rigidity = Process / Procedure.Software development needs good process and procedures for exactly the
same reasons that chemical plants or factories have manuals,
procedures, drills and emergency guidelines: preventing bad outcomes,
increasing predictability, maximizing productivity...Rigidity comes in moderation, though!!

Rigidity of the Project Structure

If each project comes with its own structure, you (and newcomers) are
lost and need to pick up from scratch every time you open them. You
don't want this in a professional software shop, and you don't want
this in a lab either.

Rigidity of the Build Systems

If each project looks different, there's a good chance they also
build differently. A build shouldn't require too much research or
too much guesswork. You want to be able to do the canonical thing and
not need to worry about specifics: configure; make install, ant,
mvn install, etc...Re-using the same build system and making it evolve over the time also
ensures a consistent level of quality.You do need a quick README to point the project's specifics, and
gracefully guide the user/developer/researcher, if any.This also greatly facilitates other parts of your build
infrastructure, namely:

So keep your build (like your projects) up to date, but make it
stricter over time, and more efficient at reporting violations and bad
practices.Do not reinvent the wheel, and reuse what you have already done.Recommended Reading:

Rigidity in the Choice of Programming Languages

You can't expect, especially in a research environment, to have all
teams (and even less all developers) use the same language and
technology stack. However, you can identify a set of "officially
supported" tools, and encourage their use. The rest, without a good
rationale, shouldn't be permitted (beyond prototyping).Keep your tech stack simple, and the maintenance and breadth of
required skills to a bare minimum: a strong core.

Rigidity of the Coding Conventions and Guidelines

Coding conventions and guidelines are what allow you to develop both
an identity as a team, and a shared lingo. You don't want to err
into terra incognita every time you open a source file.Nonsensical rules that make life harder or forbid actions explicity
to the extent that commits are refused based on single simple
violations are a burden. However:

a well thought-out ground ruleset takes away a lot of the whining
and thinking: nobody should break under no circumstances;

and a set of recommended rules provide additional guidance.

Personal Approach: I am aggressive when it comes to coding
conventions, some even say nazi, because I do believe in having a
lingua franca, a recognizable style for my team. When crap code
gets checked-in, it stands out like a cold sore on the face of an
Hollywood star: it triggers a review and an action automatically.
In fact, I've sometimes gone as far as to advocate the use of
pre-commit hooks to reject non-conformimg commits. As mentioned, it
shouldn't be overly crazy and get in the way of productivity: it
should drive it. Introduce these slowly, especially at the
beginning. But it's way preferable over spending so much time fixing
faulty code that you can't work on real issues.

Some languages even enforce this by design:

Java was meant to reduce the amount of dull crap you can write with
it (though no doubt many manage to do it).

Python's block structure by indentation is another idea in this
sense.

Go, with its gofmt tool, which completely takes away any debate
and effort (and ego!!) inherent to style: run gofmt before
you commit.

Make sure that code rot cannot slip through. Code
conventions, continuous integration and continuous
inspection, pair programming and code reviews are your
arsenal against this demon.Plus, as you'll see below, code is documentation, and that's
another area where conventions encourage readability and clarity.

Rigidity of the Documentation

Documentation goes hand in hand with code. Code itself is
documentation. But there must be clear-cut instructions on how to
build, use, and maintain things.Using a single point of control for documentation (like a WikiWiki or
DMS) is a good thing. Create spaces for projects, spaces for more
random banter and experimentation. Have all spaces reuse common rules
and conventions. Try to make it part of the team spirit.Most of the advice applying to code and tooling also applies to
documentation.

Rigidity in Code Comments

Code comments, as mentioned above, are also documentation. Developers
like to express their feelings about their code (mostly pride and
frustration, if you ask me). So it's not unusual for them to express
these in no uncertain terms in comments (or even code), when a more
formal piece of text could have conveyed the same meaning with less
expletives or drama. It's OK to let a few slip through for fun and
historical reasons: it's also part of developing a team
culture. But it's very important that everybody knows what is
acceptable and what isn't, and that comment noise is just that:
noise.

Rigidity in Commit Logs

Commit logs are not an annoying and useless "step" of your SCM's
lifecycle: you DON'T skip it to get home on time or get on with the
next task, or to catch up with the buddies who left for lunch. They
matter, and, like (most) good wine, the more time passes the more valuable
they become. So DO them right. I'm flabbergasted when I see co-workers
writing one-liners for giant commits, or for non-obvious hacks.Commits are done for a reason, and that reason ISN'T always clearly
expressed by your code and the one line of commit log you
entered. There's more to it than that.Each line of code has a story, and a history. The diffs can tell
its history, but you have to write its story.

Why did I update this line? -> Because the interface changed.Why did the interface changed? -> Because the library L1 defining it
was updated.Why was the library updated? -> Because library L2, that we need for
feature F, depended on libary L1.And what's feature X? -> See task 3456 in issue tracker.

It's not my SCM choice, and may not be the best one for your lab
either; but Git gets this right, and tries to force you to write
good logs more than most other SCMs systems, by using short logs and
long logs. Link the task ID (yes, you need one) and a leave a
generic summary for the shortlog, and expand in the long log: write
the changeset's story.It is a log: It's here to keep track and record updates.

Rule of Thumb: If you were searching for something about this
change later, is your log likely to answer your question?

Projects, Documentation and Code Are ALIVE

Keep them in sync, otherwise they do not form that symbiotic entity
anymore. It works wonders when you have:

where this tracker's tickets themselves link to the changesets in
your SCM (and possibly to the builds in your CI system),

and a documentation system that links to all of these.

Code and documentation need to be cohesive.

Rigidity in Testing

Rules of Thumb:

Any new code shall come with (at least) unit tests.

Any refactored legacy code shall come with unit tests.

Of course, these need:

to actually test something valuable (or they are a waste of time
and energy),

to be well written and commented (just like any other code you check in).

They are documentation as well, and they help to outline the contract
of your code. Especially if you use TDD. Even if you don't, you
need them for your peace of mind. They are your safety net when you
incorporate new code (maintenance or feature) and your watchtower
to guard against code rot and environmental failures.Of course, you should go further and have integration tests, and
regression tests for each reproducible bug you fix.

Rigidity in the Use of the Tools

It's OK for the occasional developer/scientist to want to try some new
static checker on the source, generate a graph or model using another,
or implement a new module using a DSL. But it's best if there's a
canonical set of tools that all team members are expected to know
and use.Beyond that, let members use what they want, as long as they are ALL:

Rigidity vs Versatility, Adapatability, Prototyping and Emergencies

Flexibility can be good. Letting someone occasionally use a hack, a
quick-n-dirty approach, or a favorite pet tool to get the job done
is fine. NEVER let it become a habit, and NEVER let this code
become the actual codebase to support.

It's About the Code, Not About the Developers

Make developers conscious of the quality of their code, BUT make them
see the code as a detached entity and not an extension of
themselves, which cannot be criticized.It's a paradox: you need to encourage ego-less programming for a
healty workplace but to rely on ego for motivational purposes.

From Scientist to Programmer

People who do not value and take pride in code do not produce good
code. For this property to emerge, they need to discover how valuable
and fun it can be. Sheer professionalism and desire to do good is not
enough: it needs passion. So you need to turn your scientists into
programmers (in the large sense).Someone argued in comments that after 10 to 20 years on a project and
its code, anyone would feel attachment. Maybe I'm wrong but I assume
they're proud of the code's outcomes and of the work and its legacy,
not of the code itself or of the act of writing it.From experience, most researchers regard coding as a necessity, or at
best as a fun distraction. They just want it to work. The ones who are
already pretty versed in it and who have an interest in programming
are a lot easier to persuade of adopting best practices and switching
technologies. You need to get them halfway there.

Code Maintenance is Part of Research Work

Nobody reads crappy research papers. That's why they are
peer-reviewed, proof-read, refined, rewritten, and approved time and
time again until deemed ready for publication. The same applies to a
thesis and a codebase!Make it clear that constant refactoring and refreshing of a codebase
prevents code rot and reduces technical debt, and facilitates future
re-use and adaptation of the work for other projects.

Why All This??!

Why do we bother with all of the above? For code quality. Or is it
quality code...?These guidelines aim at driving your team towards this goal. Some
aspects do it by simply showing them the way and letting them do it
(which is much better) and others take them by the hand (but that's
how you educate people and develop habits).How do you know when the goal is within reach?

Quality is Measurable

Not always quantitatively, but it is measurable. As mentioned, you
need to develop a sense of pride in your team, and showing progress
and good results is key. Measure code quality regularly and show
progress between intervals, and how it matters. Do retrospectives to
reflect on what has been done, and how it made things better or worse.There are great tools for continuous inspection. Sonar being
a popular one in the Java world, but it can adapt to any technologies;
and there are many others. Keep your code under the microscope and
look for these pesky annoying bugs and microbes.

But What if My Code is Already Crap?

All of the above is fun and cute like a trip to Never Land, but it's
not that easy to do when you already have (a pile of steamy and
smelly) crap code, and a team reluctant to change.Here's the secret: you need to start somewhere.

Personal anecdote: In a project, we worked with a codebase
weighing originally 650,000+ Java LOC, 200,000+ lines of JSPs,
40,000+ JavaScript LOC, and 400+ MBs of binary dependencies.After about 18 months, it's 500,000 Java LOC (MOSTLY CLEAN),
150,000 lines of JSPs, and 38,000 JavaScript LOC, with dependencies
down to barely 100MBs (and these are not in our SCM anymore!).How did we do it?We just did all of the above. Or tried hard.It's a team effort, but we slowly inject in our process
regulations and tools to monitor the heart-rate of our product,
while hastily slashing away the "fat": crap code, useless
dependencies... We didn't stop all development to do this: we have
occasional periods of relative peace and quiet where we are free to
go crazy on the codebase and tear it apart, but most of the time we
do it all by defaulting to a "review and refactor" mode every chance
we get: during builds, during lunch, during bug fixing sprints,
during Friday afternoons...There were some big "works"... Switching our build
system from a giant Ant build of 8500+ XML LOC to a
multi-module Maven build was one of them. We then had:

clear-cut modules (or at least it was already a lot better, and we
still have big plans for the future),

Another was the injection of "utility tool-belts", even though we
were trying to reduce dependencies: Google Guava and Apache Commons
slim down your code and and reduce surface for bugs in your code
a lot.We also persuaded our IT department that maybe using our new tools
(JIRA, Fisheye, Crucible, Confluence, Jenkins) was better than the
ones in place. We still needed to deal with some we despised (QC,
Sharepoint and SupportWorks...), but it was an overall improved
experience, with some more room left.And every day, there's now a trickle of between one to dozens of
commits that deal only with fixing and refactoring things. We
occasionally break stuff (you need unit tests, and you better write
them before you refactor stuff away), but overall the benefit
for our morale AND for the product has been enormous. We get there
one fraction of a code quality percentage at a time. And it's fun
to see it increase!!!

Note: Again, rigidity needs to be shaken to make room for new and
better things. In my anecdote, our IT department is partly right in
trying to impose some things on us, and wrong for others. Or maybe
they used to be right. Things change. Prove that they are better
ways to boost your productivity. Trial-runs and prototypes are
here for this.

Analyze your code with code quality checkers.Linters, static analyzers, or what have you.

Identify your critical hotspots AND low hanging fruits.Violations have severity levels, and large classes with a large number
of high-severity ones are a big red flag: as such, they appear as
"hot spots" on radiator/heatmap types of views.

Fix the hotspots first.It maximizes your impact in a short timeframe as they have
the highest business value. Ideally, critical violations should
dealt with as soon as they appear, as they are potential security
vulnerabilities or crash causes, and present a high risk of inducing a
liability (and in your case, bad performance for the lab).

Clean the low level violations with automated codebase sweeps.It improves the signal-to-noise ratio so you are be able to
see significant violations on your radar as they appear. There's often
a large army of minor violations at first if they were never taken care
of and your codebase was left loose in the wild. They do not present a
real "risk", but they impair the code's readability and maintainability.
Fix them either as you meet them while working on a task, or by large
cleaning quests with automated code sweeps if possible. Do be
careful with large auto-sweeps if you don't have a good test suite
and integration system. Make sure to agree with co-workers
the right time to run them to minimze the annoyance.

Repeat until you are satisfied. Which, ideally, you should never be, if this is still an
active product: it will keep evolving.

Quick Tips for Good House-Keeping

When in hotfix-mode, based on a customer support request:

It's usually a best practice to NOT go around fixing other issues,
as you might introduce new ones unwillingly.

Go at it SEAL-style: get in, kill the bug, get out, and ship your
patch. It's a surgical and tactical strike.

But for all other cases, if you open a file, make it your duty to:

definitely:review it (take notes, file issue reports),

maybe:clean it (style cleanups and minor violations),

ideally:refactor it (reorganize large sections and their neigbors).

Just don't get sidetracked into spending a week from file to file and
ending up with a massive changeset of thousands of fixes spanning multiple
features and modules - it makes future tracking difficult. One issue in
code = one ticket in your tracker. Sometimes, a changeset can impact multiple
tickets; but if it happens too often, then you're probably doing something wrong.

Addendum: Managing Visual Programming Environments

The Walled Gardens of Bespoke Programming Systems

Multiple programming systems, like the OP's G2, are different beasts...

No Source "Code"Often they do not give you access to a textual representation of
your source "code": it might be stored in a proprietary binary
format, or maybe it does store things in text format but hides
them away from you. Bespoke graphical programming systems are
actually not uncommon in research labs, as they simplify the
automation of repetitive data processing workflows.

No ToolingAside from their own, that is. You are often constrained by their
programming environment, their own debugger, their own
interpreter, their own documentation tools and formats. They are
walled gardens, except if they eventually capture the interest
of someone motivated enough to reverse engineer their formats and
builds external tools - if the license permits it.

Lack of DocumentationQuite often, these are niche programming systems, which are used
in fairly closed environments. People who use them frequently sign NDAs
and never speak about what they do. Programming
communities for them are rare. So resources are scarce. You're
stuck with your official reference, and that's it.

The ironic (and often frustrating) bit is that all the things these
systems do could obviously be achieved by using mainstream and general
purpose programming languages, and quite probably more
efficiently. But it requires a deeper knowledge of programming,
whereas you can't expect your biologist, chemist or physicist (to name
a few) to know enough about programming, and even less to have the
time (and desire) to implement (and maintain) complex systems, that
may or may not be long-lived. For the same reason we use DSLs, we have
these bespoke programming systems.

Personal Anecdote 2: Actually, I worked on one of these
myself. I didn't do the link with the OP's request, but my the
project was a set of inter-connected large pieces of
data-processing and data-storage software (primarily for
bio-informatics research, healthcare and cosmetics, but also for
business intelligence, or any domain implying the tracking of
large volumes of research data of any kind and the preparation of
data-processing workflows and ETLs). One of these applications was,
quite simply, a visual IDE that used the usual bells and whistles:
drag and drop interfaces, versioned project workspaces (using text
and XML files for metadata storage), lots of pluggable drivers to
heterogeneous datasources, and a visual canvas to design pipelines
to process data from N datasources and in the end generate M
transformed outputs, and possible shiny visualizations and complex
(and interactive) online reports. Your typical bespoke visual
programming system, suffering from a bit of NIH syndrome under the
pretense of designing a system adapted to the users' needs.And, as you would expect, it's a nice system, quite flexible for its
needs though sometimes a bit over-the-top so that you wonder "why
not use command-line tools instead?", and unfortunately always
leading in medium-sized teams working on large projects to a lot of
different people using it with different "best" practices.

Great, We're Doomed! - What Do We Do About It?

Well, in the end, all of the above still holds. If you cannot extract
most of the programming from this system to use more mainstream tools
and languages, you "just" need to adapt it to the constraints of your
system.

About Versioning and Storage

In the end, you can almost always version things, even with the
most constrained and walled environment. Most often than not, these
systems still come with their own versioning (which is unfortunately
often rather basic, and just offers to revert to previous versions
without much visibility, just keeping previous snapshots). It's not
exactly using differential changesets like your SCM of choice might,
and it's probably not suited for multiple users submitting changes
simultaneously.But still, if they do provide such a functionality, maybe your
solution is to follow our beloved industry-standard guidelines above,
and to transpose them to this programming system!!If the storage system is a database, it probably exposes export
functionalities, or can be backed-up at the file-system level. If it's
using a custom binary format, maybe you can simply try to version it
with a VCS that has good support for binary data. You won't have
fine-grained control, but at least you'll have your back sort of
covered against catastrophes and have a certain degree of disaster
recovery compliance.

About Testing

Implement your tests within the platform itself, and use external
tools and background jobs to set up regular backups. Quite probably,
you fire up these tests the same that you would fire up the programs
developed with this programming system.Sure, it's a hack job and definitely not up to the standard of what is
common for "normal" programming, but the idea is to adapt to the
system while trying to maintain a semblance of professional software
development process.

The Road is Long and Steep...

As always with niche environments and bespoke programming systems, and
as we exposed above, you deal with strange formats, only a limited (or
totally inexistant) set of possibly clunky tools, and a void in place
of a community.The Recommendation: Try to implement the above guidelines outside
of your bespoke programming system, as much as possible. This ensures
that you can rely on "common" tools, which have proper support and
community drive.The Workaround: When this is not an option, try to retrofit this
global framework into your "box". The idea is to overlay this
blueprint of industry standard best practices on top of your
programming system, and make the best of it. The advice still applies:
define structure and best practices, encourage conformance.Unfortunately, this implies that you may need to dive in and do a
tremendous amount of leg-work. So...Famous Last Words, and Humble Requests:

Document everything you do.

Share your experience.

Open Source any tool your write.

By doing all of this, you will:

not only increase your chances of getting support from people in
similar situations,

but also provide help to other people, and foster discussion
around your technology stack.

Who knows, you could be at the very beginning of a new vibrant
community of Obscure Language X. If there are none, start one!

Maybe even write a proposal for a new StackExchange Site in the
Area 51.

Maybe it's beautiful inside, but nobody has a clue so far, so
help take down this ugly wall and let others have a peek!original source : http://programmers.stackexchange.com/questions/155488/ive-inherited-200k-lines-of-spaghetti-code-what-now

:tabs - View a list of tabs that are open with the file names. Use the command ':tabs' and Vim will display a list of all the files in the tabs. The current window is shown by a ">" and a "+" is shown for any modifiable buffers.

:tabc - Close the current tab.

:tabnew - Open a new file for editing in a separate tab.

:tab split - Open the file in the current buffer in a new tab page.

:tabn - Switching to the next tab page.

:tabp - Switch to the previous tab page.

:tabr[ewind] - Go to the first tab page. You get the same effect when you use the :tabf[irst] command.

While the output of the top command displayed, press F, which will
display the following message and show all fields available for
sorting, press n (which is for sorting the processes by Memory)
and press enter. This will display the processes in the top output
sorted by memory usage.
Current Sort Field: K for window 1:Def
Select sort field via field letter, type any other key to return

We have discussed last week about total number of bug count and how can we reduce bugs. It was an open discussion, and I really enjoyed people's comments.Few major reasons described by people are:

UI Issues

Legacy code issue/Existing functionality issues

KLOC rule/BUGS per 1000 LOC

Bug logging categorization ( one bug but reported many times )

Strict timeline

Lack of proper unit testing/integration testing cycle

Lack of impact area analysis

Before discussing on these points in detail let's have a look on what is actual meaning of "so many bugs". One hair in soup is too many and in head too few. It all depends. There is no context to judge it. Firefox's each release contains about 1000 open bugs, but it is considered to be quite usable.Perception of the quality of a product can vary greatly between the programmer and other user. Developers and users/business people have very different views of what bugs mean. For a developer, every bug found and fixed increases the developer's confidence in the code, since it is getting more robust. For a user, every bug found (even if fixed) decreases their confidence.If you are judging only by the bug count found in testing, then that is unfortunate.I have seen may products which are judged on the basis of user's experience and it plays a big role however there were no bugs are reported from QA end.Few products were reported as buggy (from product due to complexity of product) but they are generating good percentage amount of revenue and running smoothly however no bugs are reported from QA end.Sometimes End user reported few issues which are basically marked as missing features most of times and in few cases it was closed due to wrong manipulation of data analysis.In few cases such bugs were closed by providing data in Excel to client or closed by discussion with sales person and clients.UI Issues: I believe that HTML developer provide W3C validated HTML code and HTML also verified via QA Team. Developer when integrate HTML, code will become buggy.There are many reasons for UI issues.Developer avoid cross browser testing with all versions of browses.(see browser plus OS matrix) Sometimes developers write own front-end logic which is already written and cross checked but newly written code fails as per standard that follow through out website.KLOC rule/BUGS per 1000 LOC reason is not fit in our case(website development) because if we look into codebase and nature of bugs 60-80 % of code lines were captured by HTML of code associated with front-end page rendering logic.Another point is in house testing is not also so much focusing on security testing, environmental issues or load and performance testing.Some times we have contributed more by removing code than adding ;-)In our last release we spent good amount of time to discover that code required pdo_mysql extension and few code APIs are behaving unexpected results in 64 bit machine.Here bootstrap plays a powerful role. https://github.com/twitter/bootstrap (Bootstrap is a sleek, intuitive, and powerful front-end framework for faster and easier web development.)Lets start look seriously at front-end part, try to avoiding unmoduler JS and HTML codebase.Legacy code issue/Existing functionality issues : Always Fix bugs before write new code and mark such bugs with a new tag in bugzilla.see here http://www.joelonsoftware.com/articles/fog0000000043.htmlLack of impact area analysis:It might be that the requirements weren't clear or correct, and that was reflected in the code and picked up by the testers.Failure mode and effects analysis should be a part of designing.Some how we can say we should follow more tighter software development processes. During writing code, always think in mind that we are building software of medical or military environment. Other important areas are Static code checking, peer code reviews, unit testing, component testing, system tests. It's easy to get a couple of hundred lines of code per day but try to get a couple of hundred quality lines of code per day and it's not so easy.