Lately I've been trying to split long methods into several short ones.

For example: I have a process_url() function which splits URLs into components and then assigns them to some objects via their methods. Instead of implementing all this in one function, I only prepare the URL for splitting in process_url(), and then pass it over to process_components() function, which then passes the components to assign_components() function.

At first, this seemed to improve readability, because instead of huge 'God' methods and functions I had smaller ones with more descriptive names.
However, looking through some code I've written that way, I've found that I now have no idea whether these smaller functions are called by any other functions or methods.

Continuing previous example: someone looking at the code might think that process_components() functionality is abstracted into a function because it's called by various methods and functions, when in fact it's only called by process_url().

This seems somewhat wrong. The alternative is to still write long methods and functions, but indicate their sections with comments.

Is the function-splitting technique I described wrong? What is the preferred way of managing large functions and methods?

UPDATE: My main concern is that abstracting code into a function might imply that it could be called by multiple other functions.

13 Answers
13

The solution to both of these problems is to write code that doesn't do lots of things. Write each function so that it does one thing and only one thing. This makes them easy to test with a unit test (one doesn't need umpteen dozen unit tests).

A co-worker of mine has the phrase he uses when judging if a given method needs to be broken up into smaller ones:

If, when describing the activity of the code to another programmer you use the word 'and', the method needs to be split into at least one more part.

You wrote:

I have a process_url() function which splits URLs into components and then assigns them to some objects via their methods.

This should be at least two methods. It is ok to wrap them in one publicly facing method, but the workings should be two different methods.

@exizt what is the problem with writing a function that takes a URL, and returns its components? Alternatively, mark it as protected or private (or equivalent in the language) so that it can only be called from within your module.
–
MichaelTApr 24 '13 at 16:45

28

@exizt, what's wrong with that implication? If multiple other functions find that extracted function useful, they they should absolutely call it. Refactoring and extracting methods promotes code reuse, and this is not a bad thing.
–
Anthony PegramApr 24 '13 at 16:46

1

What if it processes the components in a way that is very specific to this whole functionality, so that it could hardly ever be reused? Even if it is protected -- doesn't the fact that it is a callable function or method imply that it could be called from multiple methods inside this module?
–
exiztApr 24 '13 at 16:46

13

@exizt, if those functions are completely coupled to your workflow, then other potential callers wouldn't find them useful. Problem almost solves itself. But depending upon the language you are using, there are mechanisms for hiding that functionality from callers that wouldn't find it particularly useful, either by way of modifiers in the class file, or in the package or assembly. And perhaps you have more refactoring and organization to do. But in general, smaller, reusable methods are good things.
–
Anthony PegramApr 24 '13 at 16:52

1

@IanKelling as much of a trite truism as it is, for those who have not realized this it is useful to remind them of this. There are still people who follow the "real programmers don't comment their code. If it was hard to write, it should be hard to read" mentality. I certainly despair the way high rep answers get in a upvote cycle, but that is the nature of the system. The combination of twitter, reddit and other factors have vaulted this answer to where it is. Also meta.programmers.stackexchange.com/questions/5868
–
MichaelTApr 27 '13 at 15:05

Yes, splitting long functions is normal. This is a way of doing things that's encouraged by Robert C. Martin in his book Clean Code. Particularly, you should be choosing very descriptive names for your functions, as a form of self-documenting code.

@exizt: I think a difficulty that you're having is you might not be naming your methods purposefully. People/code should expect a method to adhere to its contract/intended purpose. If your method does something very specific, it should be apparent what that purpose is to the caller. Thus, by calling it, they require the functionality intended in its contract.
–
Steve EversApr 24 '13 at 21:49

2

@exizt how many places call your main() function? It's okay for a function to only get used once. You also don't know if thats going to stay true, maybe somewhere down the line you need to implement a very similar function and you'll get to call a lot of these functions again. Or work on a similar project and just copy the entire function as is without having to rewrite a bunch of stuff.
–
semiApr 25 '13 at 16:17

As people pointed, this improves readability. A person reading process_url() may see more clearly what is the general process to deal with URLs just by reading a few method names.

The problem is that other people may think those functions are used by some other piece of the code, and if some of them need to be changed they may choose to preserve those functions and define new ones. This means some code becomes unreachable.

There are several ways to deal with this. First is documentation and comments in the code. Second, tools that provide coverage tests. In any case, to a great extent this depends on the programming language, these are some of the solutions you can apply depending on the programming language:

object oriented languages can allow to define some private methods, to ensure they are not used elsewhere

modules in other languages may specify which functions are visible from the outside, etc.

very high level languages like Python may eliminate the need to define several functions because they would anyway be simple one liners

other languages like Prolog may require (or strongly suggest) the definition of a new predicate for every conditional jump.

in some cases it's common to define auxiliar functions inside the function that uses them (local functions), sometimes these are anonymous functions (code closures), this is common in Javascript callback functions.

So in short, splitting in several functions is usually a good idea in terms of readability. It may not be really good if the functions are very short and this creates the goto effect or if the names are not really descriptive, in that case reading some code would require jumping among functions, which may be messy. About your concern about scope and usage of these functions, there are several ways to deal with it that are in general language dependent.

In general the best advice is to use common sense. Any strict rule is very likely to be wrong in some case and in the end it depends on the person. I would consider this readable:

Personally I prefer a one liner even if it is slightly complex rather than jumping and searching across several files of code, if it takes me more than three seconds to find some other part of code I may not even know what was I checking anyway. People who do not suffer from ADHD may prefer more explanatory names that they can remember, but in the end what you are always doing is balancing the complexity in the different levels of the code, lines, paragraphs, functions, files, modules, etc.

So the keyword is balance. A function with one thousand lines is a hell for anyone reading it, because there is no encapsulation and the context becomes just too huge. A function split into one thousand functions each one of them with one line in it may be worse:

you have some names (that you could have provided as comments in the lines)

you are (hopefully) eliminating global variables and do not need to worry about the state (having referential transparency)

but you force readers to jump back and forth.

So there are no silver bullets here, but experience and balance. IMHO the best way to learn how to do this is reading a lot of code written by other people an analyzing why is it hard for you to read it and how to make it more readable. That would provide valuable experience.

Personally I prefer a one liner even if it is slightly complex and I prefer it not. It is more important to write readable code that to write it fast. If you don't write this functionality often, it is likely to make a mistake or a stupid type. If, on the other hand, you use it often, the you should already know where this utility function is stored and use that instead.
–
Maurycy ZarzyckiMay 24 '14 at 16:26

Your comment is about writing code fast or often, you are missing the point. I'm not talking about writing but about reading code, it's all about readability. A verbose program isn't readable. A one liner may work as a "summary" and as such it may be more readable, even if slightly complex. Synthesizing is important when writing anything, and specially code.
–
TrylksMay 27 '14 at 9:08

I've never taken issue with other developers splitting larger methods into smaller methods as it's a pattern that I follow myself. The "God" method is a terrible trap to fall into and others who are less experienced or simply don't care tend to get caught more often than not. That being said...

It's incredibly important to use appropriate access identifiers on the smaller methods. It's frustrating to find a class littered with small public methods because then I totally lose confidence finding where/how the method is used throughout the application.

I live in C#-land so we have public, private, protected, internal, and seeing those words shows me beyond a shadow of a doubt the scope of the method and where I must look for calls. If it's private, I know the method is used in only one class and I have full confidence when refactoring.

In the Visual Studio world, having multiple solutions (.sln) exacerbates this anti-pattern because IDE/Resharper "Find Usages" helpers will not find usages outside of the open solution.

If you're just splitting it for the sake of splitting and calling them names like process_url_partN and so on, then NO, please don't. It just makes it harder to follow later when you or someone else needs to figure out what is going on.

If you're pulling out methods with clear purposes that can be tested by themselves and makes sense on their own (even if nobody else are using them) then YES.

For your particular purpose it seems you have two goals.

Parse a URL and return a list of its components.

Do something with those components.

I'd write the first part separate and have it return a fairly general result that could be easily tested and potentially be reused later. Even better, I'd look for a built-in function that already does this in your language/framework and use that instead unless your parsing is super special. If it's super special I'd still write it as a separate method, but probably "bundle" it as a private/protected method in the class that handles the second (if you're writing object oriented code).

The second part I'd write as its own component which uses the first for the URL parsing.

If your programming language supports it, you might be able to define your "helper" functions within the scope of your process_url() function to get the readability benefits of separate functions. e.g.

If your programming language doesn't support this, you might move foo() and bar() out of the scope of process_url() (so that it is visible to other functions/methods)--but consider this a "hack" you've put in place because your programming language doesn't support this feature.

Whether to break a function into sub-functions will probably depend on whether there are meaningful/useful names for the parts and how large each of the functions are, among other considerations.

I used to do this a lot until I realized how unreadable it was later on. i.e. You're very tempted to read the implementations of foo and bar when trying to see what process_url does. And even if you don't, you're forced to skip over them every time you read the implementation of process_url. It's much nicer to put these internal methods below if you can. Then the file gets more concrete towards the bottom and more abstract towards the top.
–
crizCraigAug 10 '13 at 22:19

Much of the code we write doesn’t start out being simple. To make it simple, we must reflect on what isn’t simple about it and continually ask, “How could it be simpler?” We can often simplify code by considering a completely different solution. The refactorings in this chapter present different solutions for simplifying methods, state transitions, and tree structures.

Compose Method (123) is about producing methods that efficiently communicate what they do and how they do what they do. A Composed Method [Beck, SBPP] consists of calls to well-named methods that are all at the same level of detail. If you want to keep your system simple, endeavor to apply Compose Method (123) everywhere...

Refactoring the outer-if to an early-exit test is good, and the inner if block should be replaced by something like elements = (Object[])Array.Resize(elements, elements.length*2);, but even without such changes I'd much rather deal with the first than the second. Replacing elements[size++] = element; with a call to addElement(element) doesn't abstract away the precondition that elements.length > size; it merely hides it.
–
supercatSep 2 '14 at 22:37

Moving stuff out to smaller functions is usually better than having a lot of loops and such inside a function that concists of a few conceptual "big" steps. (Unless you can combine that into relatively small LINQ/foreach/lambda expressions...)

I'm sure this won't be the popular opinion, but it's perfectly ok. Locality of Reference can be a huge aid in making sure you and others understand the function (in this case I'm referring to the code and not to memory specifically).

As with everything, it's a balance. You should be more concerned with anyone who tells you 'always' or 'never'.

When you say "it's perfectly ok", do you mean that it's ok to split the function into smaller ones? If so, why would you think it won't be the popular opinion, since there were at least 8 answers before yours that supported that choice?
–
Avner Shahar-KashtanApr 26 '13 at 9:44

I mean it's ok to leave it as a long function, it's a judgement call based upon the code itself and what it's doing. There's no reason it should automatically be broken down with no thought put into why.
–
FredApr 26 '13 at 19:18

This function has up to 8 possible paths how your code can be executed, depending on how those condition evaluate.

This means that you'll need 8 different tests just to test this part. Moreover, most likely some combinations will not be possible, and then you'll have to carefully analyze what are they (and be sure not to miss some that are possible) - a lot of work.

It is very hard to reason about the code and its correctness. Since each of the if blocks and its condition can depend on some shared local variables, in order to know what's happening after them, everybody working with the code has to analyze all those code blocks and the 8 possible execution paths. It's very easy to make a mistake here and most likely somebody updating the code will miss something and introduce a bug.

Easily test each of the helper functions, each of them has only two possible paths of execution.

When examining myFun it is immediately obvious if result2 depends on result1 (just checking if the call to myHelper2(...) uses it to compute one of the arguments. (Assuming that the helpers don't use some global state.) It is also obvious how they're dependent, something that is much harder to understand in the previous case. Moreover, after the three calls, it's also clear how the state of the computation looks - it's captured just in result1, result2 and result3 - no need to check if/what other local variables have been modified.

This has the appearance of an anti-pattern (that I can't find the name of at the moment) where each task is only partially doing the job, but the two always need to be called in sequence. Each method should do one thing and one thing only, and all of one thing - not "part 1 of one thing" and "part 2 of one thing". If this is something other than you intended, could you modify your answer to indicate this?
–
MichaelTApr 25 '13 at 17:29

6

@MichaelT - it is simply called sequential coupling, and yes, it is considered harmful in most circumstances.
–
CurtainDogApr 26 '13 at 7:32

@CurtainDog It's no more harmful than calling a factory method and following that up with a call to a method on said object. It's required to happen in a specific sequence. The problem isn't things needing to happen in a sequence, it's when they need to happen in a sequence and it isn't obvious that's the case. No one would reasonably attempt to call a method on an object before it's been created, and thus, the sequential nature of it doesn't cause problems.
–
FredApr 26 '13 at 19:25

1

@Fred a factory method gives someone a workable object. If foo_part1() and foo_part2() are incomplete in of themselves, done only to break the code up for the sake of breaking code up, this is bad. The OP hasn't shown which interpretation this is what is intended.
–
MichaelTApr 26 '13 at 19:40