I'm an Undergraduate Computer Science Student currently on a placement year at a company that produces and supports a large enterprise web application. I'm loving the experience of seeing how software is produced in the real world, and feel very lucky to find a company that offers a chance to not only maintain and extend existing functionality but also develop entirely new features for the product.

All that said, though, I'm very conscious that this is very, very unlikely to be a perfect example of how to develop correctly. Far from it, in fact. I feel I'm learning a massive amount from my experience here, and I don't want to learn the wrong things or pick up bad habits from colleagues that could be hard to shake off further down the road. Mostly it's easy to tell what's good and what's not - for example the Unit Test coverage here is practically non-existent for various reasons (mostly poor excuses mixed in with one or two valid points). Lately, though, I've been noticing a regular occurrence that I'm just not sure about.

Whenever we start a new project, naturally we need to find any relevant code that needs to be extended, altered, or removed. It seems to me that, the vast majority of the time, anything not within the most commonly used sections of the application takes people an age to find within the codebase. There are one or two tech leads who know their section of the code well, but even they get stumped sometimes and have to spend a long time searching for what they require, or turn to someone who has been editing that part of the code recently (if anyone) for help. When I say a long time, I don't mean hours (usually) but it seems to me that a good codebase would be navigable to any point within a few minutes at worst, to anyone even vaguely familiar with the system.

So, my question. Is the above problem due to poorly structured code? Alternatively, is it down to developers not having enough knowledge of the codebase? Or is it simply unavoiable in large applications, regardless of how much work goes into keeping the file structure clear?

Or, indeed...am I just wasting my time on a topic that really doesn't matter?

Both being about large code bases doesn't make it a duplicate. Asking why large code bases are hard to navigate is a very different question from how to navigate one.
–
Karl BielefeldtJan 11 '14 at 14:45

4 Answers
4

Large code bases aren't designed, they evolve. A lot of things that don't make sense when looking at a current snapshot, make perfect sense when you take history into account. And I don't just mean the individual history of the code base, I also mean the history of software engineering practices in general.

Unit testing pretty much always existed to some degree, but didn't really come into widespread usage until extreme programming and test driven development were "invented," around the years 1999 to 2003. A lot of code bases predate that, and consequently were not designed in a manner that made unit testing easy.

There is similar history behind other software engineering practices. For example, the DVCS revolution of 2005 changed the way people think about workflows and branching models, even with non-distributed version control. For another example, even though it existed, almost no one had heard of the MVC design pattern until Microsoft created a framework with that name, and now failure to separate model and view is much more highly discouraged than it used to be, even in projects that don't use Microsoft's framework.

Creation and popularization of online peer review tools essentially ended the practice of a peer review being a formal meeting, and made them much easier to perform, and therefore more ubiquitous. Popularization of garbage collected languages prompted memory management innovations in C++ like smart pointers and RAII, which are now considered standard practice, but were unheard of when many current code bases were started. I could go on and on.

As companies grow, they try to take advantage of code reuse as much as possible, so a code architecture that was ideal for one product might be pulled into another project with little modification, even though the architecture might be a little awkward in the new context. When this happens multiple times over possibly decades, the big picture stops making sense.

It's not that companies don't want to change with the times, but code bases are like ocean liners. They take a long time and careful planning to turn around. Therefore, it is highly unlikely you will ever find a large code base that wouldn't be redesigned a different way if starting from scratch today. The important thing to look for is if a company is striving to turn in the right direction.

See, this is what I love about the people of StackExchange. Not only have you answered by question clearly and informatively, you've also provided loads of context and with it some related CS history. Thank you. I'll leave the question open for a day or so longer in case anyone else has anything to add, but I think this'll take some beating for accepted answer :P
–
HecksaApr 18 '12 at 23:39

Certainly there is a limit to the complexity the human mind can grasp. You cannot expect somebody to know his way around millions of lines of code. That is why you should structure and document it in a reasonable and comprehensible way. Usually, chances to structure code are left out. You won't find a database class in the package for a graphical user interface. But you might find a database class doing something quite different (e.g. reporting) as well. Classes and modules tend to grow almost organically. Organizing code in more but smaller classes with single responsibilities is preferable too make code easier to understand. Easy to understand code is key in achieving the goals of software engineering e.g. correct, robust, reusable and extendable software.

You won't find any developer who wants to write a lot of code. The idea is to write only as much and have it written in a way that's extensible.

Unfortunately, developers have to gather the s/w requirements from business/sales/marketing and they are usually never specific. So you have use cases that have to be patched in some code which was never meant to do what it is doing in the first place.

There is no way to alter this situation, unless you have a way with human minds. What can save you is mandatory documentation, strong unit and integration testing framework, bug tracking system, developer mailing list within the company which can archive mails, peer review and learning generic programming techniques.

Also, consider using as much as you can from open source components since they generally have a good user base and moderate levels of documentation. More importantly, you have people to ask questions to if your tech lead is on vacation.

Large scale software design related books are also a welcome addition.

When tackling a brownfield application, the ideal first step is to write unit tests for it.

This accomplishes several things:

It forces you to work through the code and understand it

The unit tests serve as documentation and sample code for the existing functionality, and

The unit tests provide a safety net for refactoring.

Whether or not the organization has the inclination to allow you to do this is another matter. They've lived without unit tests for this long; getting them to agree to reduce the technical debt in this way will be difficult.