Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise.
If this question can be reworded to fit the rules in the help center, please edit the question.

5

The length should not be measured in LOC, but in the amount of time required to comprehend precisely what it does. And that length, should be no more than a minute. If I can't figure it out in a few seconds, it's probably doing too much...after a minute, it definitely is.
–
CaffGeekDec 23 '10 at 17:50

18 Answers
18

Since I embarked on this crazy racket in 1970, I have seen exactly one module that really needed to be more than one printed page (about 60 lines). I have seen lots of modules that were longer.

For that matter, I have written modules that were longer, but they were usually large finite state machines written as big switch-statements.

Part of the problem appears to be that programmers these days are not taught to modularize things.

Coding standards that maximize the waste of vertical space appear also to be part of the problem. (I have yet to meet a software manager who has read Gerald Weinberg's "Psychology of Computer Programming". Weinberg points out that multiple studies have shown that programmer comprehension is essentially limited to what the programmer can see at any given instant. If he has to scroll, or turn a page, his comprehension drops significantly: he has to remember, and abstract.)

I remain convinced that a lot of the well-documented programmer productivity gains from FORTH were due to the FORTH "block" system for source code: modules were hard-limited to an absolute maximum of 16 lines of 64 characters. You could factor infinitely, but you could not under any circumstances whatsoever write a 17-line routine.

The whole philosophy of FORTH was designed to encourage this... You were intended to design your own vocabulary, relentlessly breaking up your program into smaller and smaller pieces, ending up with less of a script and more of a dictionary. Length-limits alone don't do it - you'll see a routine split into arbitrary pieces just to satisfy some coding standard. I think you're absolutely correct in suspecting that programmers are simply "not taught to modularize things"; this should have been one of the big wins of OOP, but for various reasons it's often de-emphasized as a goal unto itself.
–
Shog9♦Dec 23 '10 at 18:51

What's the Right Size, Really?

Depends on the language you use, but in general (and for my personal taste):

Ideally, less than 25 lines.

Acceptably, less than 35 lines.

If it's more, then it's something I need to come back to later and rework.

But realistically, any size it needs to be when you need to deliver something and that it makes more sense on the moment to spit them out like that, makes it even easier sometimes for someone to review before shipping. (but still get back to it later).

(Recently my team ran a program on our codebase: we found class with 197 methods and another with only 3 methods but one of them was 600 lines. Cute game: what's the worse of the 2 evils?)

Now for a more zen answer... In general it's considered a good practice (TM) to quote one or two great men, so here goes:

Addendum on Comment Styles

As an addendum to this, your functions should have clear names explaining their intent. Regarding comments, I usually don't comment inside a function:

comments say "why?",

code says "how?".

A comment block at the top of each function (that requires explanation) is enough. If your function is small and function names are explicit enough, then you should just need to say what you want to achieve and why. I use inline comments only for fields in some languages or on block starts for functions that break that 25-35 line rules if the intent is unclear. I use a block comment inside the code when an exceptional situations occurs (a catch block where you don't need or want to do anything should have a comment saying why, for instance).

Of course, sometimes you have functions where you just copy 70 different fields to 70 different places. I mostly use tt to generate these but sometimes you're stuck with a long-ass function (or a long ass-function) that doesn't really do anything of interest anyway so isn't a real problem.
–
configuratorDec 19 '10 at 18:25

1

When the function is e.g. Map(x => x.Property1); Map(x => x.Property2); Map(x => x.Property3); it's kind of clear that it's all quite the same. (Note this is just an example; this kind of function does pop up from time to time)
–
configuratorDec 20 '10 at 2:12

In my opinion, every function should be as small as possible. Each function should do only one think and do it well. That doesn't really answer the maximum length question, but it is more my feelings on the length of functions.

What should be the maximum height of a building? Depends on where the build is, or the height you want it to be.
You may get different answers from different people which come from different city.
Some script functions and kernel interrupt handlers are very long.

A method that works for me is: Can I have a part of a longer function give a name that makes sense. I think length of a method is not that important like good naming. The method should do what the name says, no more and no less. And you should be able to give a good name. If you cannot name your method good, the code is probably not good put together.

The question should be how many things should a function do. And usually, it's rare that you need a 100 lines to do "one" thing. Again that depends on the level from which you are looking at the code: Is hashing a password one thing? Or is hashing and saving the password one thing?

I would say, start with saving password as one function. When you feel that hashing is different, and you refactor the code. I am no expert programmer by any means, but IMHO, the whole idea of functions begin small is that the more atomic your functions are, the higher the chance of code re-use, never having to make the same change in more than one place, etc.

I have seen SQLstored procedures that run over 1000 lines. Does the number of lines of stored procedures also be less than 50? I don't know, but it makes reading the code, hell. Not only do you have to keep scrolling up and down, you need to give a few lines of code a name like "this does validation1", "this updates in the database", etc. - a work that the programmer should have done.

The cyclomatic complexity of a section of source code is the count of
the number of linearly independent paths through the source code.

I recommend that you keep that number under 10 in a single method. If it gets to 10, then it's time to re-factor.

There are tools that can evaluate your code and give you a cyclomatic complexity number.

You should strive to integrate these tools into your build pipeline.

Don't literally chase a method size, but try to look at its complexity and responsibilities. If it has more than one responsibility, then it's probably a good idea to re-factor. If its cyclomatic complexity increases, then it's probably time to re-factor.

I'm fairly certain there are other tools that give you similar feedback, but I didn't have a chance to look into this yet.

I think there is a trade off. If you have a lot of short methods, it is often harder to debug them than one long method. If you have to jump around the editor 20 or 30 different times to trace one method call, it will be hard to keep it all in your head. Meanwhile if there is one well written clear method, even if it is 100 lines, it is often easier to keep in your head.

The real question is why should items be in different methods, and the answer as given above is code re-use. If you aren't re-using the code (or don't know) then it may make sense to leave it in one giant easy to follow method and then as you need to re-use it, split the parts that need re-using into smaller methods.

In reality part of good method design is making functionally cohesive methods (essentially they do one thing). The length of the methods do not matter. If a function does one well defined thing and is 1,000 lines than it is a good method. If a function does 3 or 4 things and is only 15 lines, then it is a bad method...

Methods should be so short as to do exactly one thing. The reason for this is simple: so your code can be properly optimized.

In a JIT-ted language like Java or C#, it is important that your methods be simple so that the JIT-compiler can produce code quickly. Longer, more complicated methods naturally require more JIT time. Also, JIT compilers only offer a handful of optimizations and only the simplest of methods benefit from this. This fact was even called out in Bill Wagner's Effective C#.

In a lower-level language, like C or C++, having short methods (maybe a dozen or so lines) is also important because that way you minimize the need for storing local variables in RAM rather than in a register. (Aka 'Register Spilling'.) Note however that in this unmanaged case, the relative cost of each function call can be pretty high.

And even in a dynamic language, like Ruby or Python, having short methods also helps out in compiler optimizations as well. In a dynamic language the more 'dynamic' a feature, the harder it is to optimize. For example, a long method which takes an X and could return an Int, Float, or String will likely perform far slower than three separate methods which each only return a single type. This is because, if the compiler knows exactly what type the function will return, it can optimize the function call site as well. (E.g., not checking for type conversions.)

I don't put a hard line limit on anything because some functions implement algorithms that are inherently complex and any attempt to make them shorter would make the interactions between the new, shorter functions so complicated that the net result would be no reduction in simplicity. I also don't believe that the idea that a function should only do "one thing" is a good guide, since "one thing" at a high level of abstraction can be "many things" at a lower level.

To me, a function is definitely too long if its length causes subtle violations of DRY right now, and extracting part of the function into a new function or class could solve this. A function may be too long if this isn't the case, but a function or class could easily be extracted that would make the code more modular in a way that is likely to be useful in the face of foreseeable change down the road.

My general rule is that a function should fit on the screen. There are only three cases I have found that tend to violate this:

1) Dispatch functions. In the old days these were common but most of them are replaced with object inheritance these days. Objects only work inside your program, though, and thus you'll still see occasional dispatch functions when dealing with data arriving from elsewhere.

2) Functions that do a whole bunch of steps to accomplish a goal and where the steps lack a good subdivision. You end up with a function that simply calls a long list of other functions in order.

3) Like #2 but where the individual steps are so small that they are simply inlined rather than called separately.

I have seen a thousand-line routine that I had no problem with. It was a huge switch statement, no option exceeded a dozen lines and the only control structure in any option was a single loop. These days it would have been written with objects but that wasn't an option back then.

I'm also looking at 120 lines in a switch in front of me. No case exceeds 3 lines--a guard, an assignment and the break. It's parsing a text file, objects aren't a possibility. Any alternative would be harder to read.

Maybe function length is not so a good metric. We try to use cyclomatic complexity, on methods too, and one of the future source control checkin rules that cyclomatic complexity on classes and methods must be lower than X.