Goals

All these methods basically help to reach a single goal: improve the modularity of the code. Modularity is the only way to reduce complexity.

To improve modularity, one usually tries to:

Separate the general from the specific

This is one of (if not the) most important principles. It is always good to separate degrees of abstraction, since mixing them all together is one of the major sources of complexity.

Split code into parts (functions, modules, etc)

Note that splitting into parts doesn't mean a trivial splitting of code into several parts - this helps only a little. It is a more complex process, where one has to identify a set of general abstractions, a mix of which, parametrized with the specifics, results in the most economical and simple implementation - and then separate those abstractions into separate functions or modules, so that the remaining specific part of the code is as small and simple as possible.

This may be non-trivial, because it may not be apparent from the specific implementation, what these parts are, and one has to first find points of generalization, where the code must first be generalized - then those "joints" will become visible. It takes some practice and a habit of thinking this way, to easily identify these points.

It also greatly depends on what's in your tool set: these "split points" will be different for say, procedural, functional and OO programming styles, and that will result in different splitting patterns and at the end, different code with different degrees of modularity.

Decrease coupling between the parts

This basically means decreasing their inter-dependencies, and have well-defined and simple interfaces for their interaction. To reach this goal, one frequently adds levels of indirection, and late decision - making.

Increase the cohesion for each part (so that the part is much more than the sum of its sub-parts)

This basically means that the components of each part don't make much sense taken separately, much like parts of the car's engine - you can't take out any one of them, they are all inter-related and necessary.

Make decisions as late as possible

A good example in the context of Mathematica is using Apply: this postpones the decision on which function is called with a given set of arguments, from write-time to run-time.

Late decision - making decreases coupling, because interacting parts of the code need less information about each other ahead of time, and more of that information is supplied at run-time.

General things

Here I will list some general techniques, which are largely language - agnostic, but which work perfectly well in Mathematica.

Embrace functional programming and immutability

A lot of problems with large code bases happen when the code is written is stateful style, and state gets mixed with behavior. This makes it hard to test and debug separate parts of the code in isolation, since they become dependent on the global state of the system.

Functional programming offers an alternative: program evaluation becomes a series of function applications, where functions transform immutable data structures. The difference in resulting code complexity becomes qualitative and truly dramatic, when this principle is followed down to the smallest pieces of code. The key reason for this is that purely functional code is much more composable, and thus much easier to take apart, change and evolve. To quote John Hughes ("Why functional programming matters"),

"The ways in which one can divide up the original problem depend directly on the ways in which one can glue solutions together. "

I actually highly recommend to read the entire article.

In Mathematica, the preferred programming paradigms, for which the language is optimized, are rule-based and functional. So, the sooner one stops using imperative procedural programming and moves to functional programming in Mathematica, the better off one will be.

Separate interfaces and implementations

This has many faces. Using package and contexts is just one, and rather heavy, way to do that. There exist also ways to do that on the smaller scale, such as

Creating stronger types

Using the so-called i - functions

Inserting pre and post-conditions in functions

Master scoping constructs and enforce encapsulation

Mastering scoping is essential for scaling to larger code bases. Scoping provide a mechanism for information - hiding and encapsulation. This is essential for reducing the complexity of the code. In non-trivial cases, it is quite often that, to achieve the right code structure, even inside a single function, one may need three, four or even more levels of nesting of various scoping constructs (Module, Block, With, Function, RuleDelayed) - and to do that correctly, one has to know exactly what the rules of their mutual interaction are, and how to bend those rules if necessary. I can't overemphasize the importance of scoping in this context.

Separate orthogonal components in your code

This is a very important technique. It often requires certain advanced abstractions, such as higher-order functions and closures. Also, it requires some experience and certain way of thinking, because frequently code doesn't look like it can be factored - because for that certain parts of it should be rewritten in a more general way, yet it can be done. I will give one example of this below, in the section on higher-order functions.

Use powerful abstractions

Here I will list a few which are particularly useful

Higher-order functions

Closures

Function composition

Strong types

Macros and other meta-programming devices

Use effective error-reporting in internal code, make your code self-debugging

There are a number of ways to achieve that, such as

Using Assert

Setting pre and post - conditions

Throwing internal exceptions

All them combined, lead to a much simpler error diagnostics and debugging, and also greatly reduce regression bugs

Use unit tests

There has been enough said about the usefulness of unit tests. I just want to stress a few additional things.

Mathematica meta-programming capabilities make it possible and relatively easy to simplify generation of such tests.

The extremely fast development cycle for the prototyping stage somewhat flies in the face of unit-testing, since code changes so fast that writing unit tests becomes a burden. I would recommend to write them once you move from a prototype to a more stable version of a particular part of your code.

Topics not covered yet (work in progress)

To avoid making this post completely unreadable, I did not cover a number of topics which logically belong here. Here is an incomplete list of those:

More details about packages and contexts

Error reporting and debugging

Using metaprogramming, macros and dynamic environments

Using development tools: Workbench, version control systems

Some advanced tools like parametrized interfaces

Summary

There are a number of techniques which may be used to improve the control over code bases as they grow larger. I tried to list a few of them and give some examples to illustrate their utility. These techniques can be roughly divided into a few (overlapping) groups:

Small-scale techniques

Effective use of core data structures

Code granularity

Function overloading

Small-scale encapsulation, scoping, inner functions

Function composition

Large-scale techniques

Packages and contexts

Factoring orthogonal components

Separation of interfaces and implementations

Using powerful abstractions

Abstract data types, stronger typing

Closures

Higher - order functions

Macros and other metaprogramming techniques

This is surely not an ideal classification. I will try to make this post a work in progress and refine it in the future. Comments and suggestions more than welcome!

II. Managing the complexity: controlling complexity on the smaller scale

There are a few things you can do to control and reduce the complexity of your code, even on the small scale - long before you move to packages and split code into several files.

Effective use of the core data structures

This is probably the first thing to mention. The most important core data structures are Lists and Associations. Mastering them and using them effectively goes a long way towards writing much better Mathematica code.

Some of the properties which make both Lists and Associations so effective are:

They are very well integrated into the language

They are polymorphic data structures. Lists can hold elements of any type, and Associations can use elements of any type both for keys and for values.

They are very universal. In particular, Lists can be used for arrays, sets, trees, etc., and Associations implement a very general key - value mapping abstraction.

Using them with a functional programming style leads to very compact code doing non-trivial data transformations fast. This both reduces the code bloat and improves the code speed.

They offer a fast and cheap way to do exploratory programming and prototyping, where you don't have to create new types, so can create and change complex data structures on the fly.

However, in the long term, one has to be aware of certain flaws as well:

Lists:

Adding and removing elements is O(n) operation, where n is the length of the list

Associations

Are rather memory-hungry

Element by element modifications can be rather slow. Even though Associations themselves have roughly O(1) complexity for these operations, one still have to do a top-level iteration to, for example, build an association element by element, and top-level iteration is slow. In other words, there is no analogue of packed arrays for associations. In some cases, one can use functions like AssociationThread, which operate on many keys and values at once.

Common

It is easy to get regression bugs from changes in code, due to weak typing

Code granularity

In most cases, it is much better to split your code in a number of fairly small functions, each one doing some very specific task. Here are a few suggestions regarding that:

Use small functions (just a few lines of code each)

Avoid side effects as much as possible

In particular, prefer With to Module when possible

Write code in a style that promotes function composition

Use operator forms and currying (available for a number of built-in functions since V10)

Example: simplistic DOM viewer

Below is the code of a rudimentary viewer for a DOM structure of an HTML page:

This is what I call granular code: it contains a few really tiny functions, which are very easy to understand and debug.

Example: modeling and visualizing random walks

This one was a real question asked by someone in the Russian-speaking Mathematica online group. It is valuable since this is a real problem, and it was originally formulated in a procedural style.

The problem is to model a 2-dimensional random walk with certain step probabilities, which are constants (don't depend on the previous steps). The question asked is to find a probability to return to the point of origin in less than a given number of steps. This is done using essentially Monte-Carlo simulation, running the single walk simulation multiple times, and finding how many steps it took to return back, for a particular experiment.

Here is the original code. The problem settings (I keep the original code):

(* The choice of the next step *)
Step[R_] := v[[Select[p, #[[1]] >= R &, 1][[1]][[2]]]];
(* Array initialization. m[[i]] gives a number of successful returns in i-th run *)
m = Array[#*0 &, {Z}];
(* Running the experiments *)
For[k=0,k<Z,k++
For[j=0,j<n,j++,
(* Initial position of a point *)
X0={0,0};
(* Making the first step *)
i=1;
X0+=Step[RandomReal[]];
(* Move until we return to the origin, or run out of steps *)
While[(X0!={0,0})&&(i<q),{X0+=Step[RandomReal[]],i++}];
(* If the point returned to the origin, increment success counter *)
If[X0=={0,0},m[[k]]++];
];
];//AbsoluteTiming
(* {5.336, Null} *)

Here is the visualization of the experiment (basically, the unnormalized empirical CDF and PDF):

It ends up 10 times faster, but also the above code is, at least for me, much more readable - and you can easily test all individual functions, since they don't depend on anything that has not been passed to them explicitly.

What I would say is that I strongly prefer the granular version, in all cases but those where the condensed one offers far superior performance, and only if this is critical for the problem. In this particular case, the performance is the same, and in most other cases it also won't be worth it to keep such code, since it is much harder to understand.

In any case, to me this example serves as another good illustration of the advantages and superiority of functional programming done in a granular fashion, and I hope it additionally illustrates my point about the importance of granularity.

Function composition

Writing code in this style is very beneficial for readability, extensibility and the ease of debugging. Do it, when you can.

Example: inverting many to many relationships

I will borrow this one from this answer. The function below inverts many-to-many relationship encoded in an association:

But here I just want to stress the way the function is written: using Composition and operator forms makes the code much more transparent and much easier to debug and extend. To debug, you basically need to stick something like showIt
in between any two transformations in the chain, and to extend, you can simply add transformations.

Function overloading

When you define functions using patterns, you can use function overloading - giving several definitions to a single function, on various number / types of arguments. Languages which support overloading, have mechanisms for automatic dispatch to the right definition, given specific input arguments. This automation can be used to simplify programmer's life and write more expressive code. Mathematica fully supports overloading via its core pattern-matching engine, and in fact its pattern-matching capabilities can be thought of as "overloading on steroids" in this context, compared to other languages - even those which support multiple dispatch. You can actually often design your code in a such a way as to maximally utilize this option.

Functions written in such a style are typically (not always though) much more readable and extensible, than if you would have a single large Switch (or worse yet, nested If) inside the body. The reason has partly to do with the fact that this technique is roughly equivalent to introduction of ad-hoc mini type systems (since you overload on function arguments, which you check using patterns, and thus defining weak types for them), and partly because Mathematica allows multiple dispatch, which is much more powerful than a single - argument dispatch available in many other languages.

I will illustrate this with a single example taken from the RLink module source code: this single function determines the type of all RLink objects, either sent to R from Mathematica, or received from R:

This example illustrates two more quite useful tricks: use local variables shared between the body and the condition of the rule, and use the catch-all pattern to throw local (internal) exception - but these I will discuss separately.

To summarize advantages of this method:

Easier to read, understand, write, and debug such code

Code written in this way is more extensible

Often you can get rid of intermediate variables, that would be necessary otherwise

Some of the things to watch for are, though:

You have to keep an eye on the relative generality of definitions

In some rare cases, you may need to manually reorder definitions

You can't use Compile on functions which use rules and patterns

Small scale encapsulation: inner functions

This is a form of encapsulation, where you introduce inner functions, local to the Module, Block, or With scoping constructs that you use to encapsulate your local variables / state. The advantage of this technique is that you can achieve a better level of modularity and readability of your code on a smaller scale, without using such a heavy tool as contexts and packages.

where dirF accepts 2 parameters: subdirName,level, and
fileF accepts 1 parameter - file name. You can use this to traverse a directory tree, applying arbitrary functions to files and directories at a specified level, and can set at runtime directories which have to be skipped entirely.

Before we run this code, a few words about it. It is all built on inner functions and closures. Note that all of the clearSkip, setSkip and dtraverse are closed over a local variable skip. Moreover, withLevel and traverse are inner closures, closed over level and skip, fileF and dirF, respectively. What do I buy with closures? Better composition, and better code structuring. Because I don't have to explicitly pass the parameters, I can, for example, pass traverse directly as a parameter to shallowTraverse, making the code easier to read and understand.

The code structure here is very transparent. I view nested directory traversal with functions fileF and dirF as a shallow traversal, where fileF gets applied to files, while to the sub-directories we apply the traverse function. Now, what do I buy with factoring out withLevel? I could've easily wrapped level++;code;level-- in the body of traverse. The answer is, I separate the side effect. Now I could test the inner Function[lev, ...] in isolation, at least in principle.

Let us now see what the run-time skip facility can give us. Here I will run through the entire directory tree for the $InstallationDirectory, but only collect the names of the first-level subdirectories:

And get the same result, only 1000 times faster. I think this is pretty cool given that it only took a dozen lines of code to implement that. And using inner functions and closures made the code clear and modular even on such a small scale, and allowed to cleanly separate state and behavior.

Example: Peter Norvig's spelling corrector in Mathematica

In the following example this idea is pushed to the extreme. Here is where it comes from. It is hard to beat the clarity and expressiveness of Python, but at least I gave it a try.

Here is a training data (it takes some time to load this):

text = Import["http://norvig.com/big.txt", "Text"];

Here is the code I ended up with (I cheated a bit by abbreviating a number of built-ins using With, because in the original post of Norvig, there was a kind of competition between languages, and I wanted the code as short as possible, without losing readability. But I ended up liking it):

The above code illustrates another thing about inner functions: you may use them also to significantly change the way the code looks inside Module. Of course, we could write it all in a single incomprehensible one-liner, but that would hardly benefit anyone.

Summary

I personally use inner functions all the time, and consider them an important tool for improving small-scale encapsulation, structure, readability and robustness of the code.

One thing to watch out for is that in some cases, inner functions are not garbage - collected automatically. This usually happens when some external objects point to them at the time when they are defined. This may or may not be acceptable, depending on your circumstances. There are also ways to avoid it, such as using pure functions (which, however, can't be easily overloaded and are generally less expressive since you can't easily do pattern-based arguments destructuring and tests for them).

Final example: Huffman encoding

To illustrate many of the points I mentioned above, I will here provide my re-implementation of the Huffman encoding algorithm, based on the code from David Wagner's excellent book. So I refer to his exposition for details on the algorithm and ideas used. I rewrote it to use Associations, and made it purely functional, so that there is no mutable state involved whatsoever, anywhere in the code.

Notes

I think, this example illustrates very well the kind of economy and simplicity that is possible to get from a combination of functional programming, very granular code, function overloading, function composition, operator forms / currying (note that I actually introduced currying / operator form also for the user-defined extract function), and the user of core data structures (Lists and Associations) in Mathematica.

All code contains absolutely no mutable state. Except for inner functions, it uses all of the techniques I described above. The result is a tiny program that solves a non-trivial problem, and while there is no room here for code dissection, it is very easy to take this code apart and understand what goes on at each step. In fact, it is mostly clear how it works just from looking at the code.

Of course, the main credit goes to David Wagner, I just made a few changes to utilize a few recent additions like Associations and completely remove any mutable state.

III. Managing the complexity: using powerful abstractions

In this section I will list a few techniques which allow one to write more modular code and better separate the concerns, by using certain powerful abstractions provided by or possible to have in the Mathematica.

Higher-order functions

These are functions which take other functions as arguments. In Mathematica, a number of core built-in functions like Map and Apply are higher-order functions.

The utility of this construct can be seen most clearly within the functional programming paradigm. Higher-order functions can be used to parametrize generic functionality, where custom behavior is injected with functional arguments. This allows one to easily separate generic functionality from the specific.

Trivial example: Select

One trivial example of a built-in higher-order function is Select. We can write a more specific version of Select that would select numbers larger than a threshold by parametrizing Select with an appropriate test function:

Note that the test function #>threshold& is in fact a closure, closed over threshold and created at run-time.

Example: Gram - Schmidt orthogonalization

In this answer, I gave a possible implementation of the Gram - Schmidt orthogonalization procedure, as a higher-order function

GSOrthoNormalizeGen[startvecs_List, dotF_, plusF_, timesF_]

which takes the functions implementing dot product, addition of vectors and multiplication of a vector by a scalar, as parameters. As a result, it can be used for vectors from any vector spaces - all one have to do is to implement these specific functions. In the linked post there are examples for the space of 3D vectors and space of functions.

What this means in practice is that the generic implementation and the specific parts parametrizing it are completely decoupled, they can (and probably should) live in different parts of the project, or even belong to different sub-projects. Therefore, such generalization actually simplifies the code even if I am only interested in a single type of a vector space.

Closures

Closures are functions that are created at run-time, and have access to the enclosing environment (variables and functions from it). They can then operate on that environment long after the code execution leaves it. Closures are an effective tool to factor and separate functionality. They realize a form on encapsulation, somewhat similar to objects, but more lightweight - they encapsulate behavior rather than state (although can manipulate the state too).

Example: approximate derivative of a function

This is a classic example. Here, we construct a function that would approximately compute a derivative of another function, numerically:

approxD[f_, dx_] := Function[x, (f[x + dx] - f[x])/dx]

We can now define it for some function:

dsin = approxD[Sin, 0.2];

and use it:

dsin[0]
(* 0.993347 *)

The whole point is that we don't have to know how dsin was constructed, and all the information associated with the process of its construction - we can just use it. It can be stored somewhere, and then used at some later point, perhaps by a different part of the system. It is this kind of separation of construction and execution of certain behavior that makes closures so effective at factoring apart separate components of the system.

Example: iterator for Fibonacci numbers

Here is a very simple example of a closure, implementing an iterator of Fibonacci numbers:

Once again, note that after we construct an iterator, it can be used at some point later on, perhaps by a completely different part of the system. That part may not care what this iterator is made of, or how it was constructed - it only knows that the iterator returns a next element when called.

Summary

Closures are a very useful abstraction, that allows one to encapsulate behaviors and use them later on, without bothering about the full execution context for those behaviors (which may no longer be available), since closures still do have access to that context. They can be thought of as very light-weight objects, but the key difference is that they encapsulate behavior rather than state (but can also manipulate internal state if that's the part of the behavior). They usually work together with higher - order functions, being passed to them and called by them.

Closures facilitate information-hiding, because they allow different parts of the system to exchange minimal units of encapsulated behavior, and the less different parts must assume about the other parts, the more they get decoupled, and the more modular the entire code base becomes. At the same time, they protect their internal state much better than the full-fledged objects in the OO paradigm (unless those have no setters, i.e. are read-only), since they don't provide means to change their internal state other than what they do themselves when being executed. The real need of objects arises when one needs more than one closure with a shared mutable state, but in many cases this isn't really necessary and single closures can do just fine.

ADTs and stronger typing

Defining abstract data types and making the code de facto more strongly typed is an important tool to scale to larger code bases. It allows to automatically exclude large classes of bugs, which otherwise are likely to appear in the course of code evolution. It also often makes the code much more readable, and much easier to reason about.

There are several possibilities for enforcing stronger typing in one's code. Basically, one can:

Use patterns and argument checks to enforce types

This is easier to do and is a less formal way to introduce types, which is used in many practical situations. The typing it introduces is similar to a "duck typing", however.

Use dedicated inert heads as data containers / types

This method allows one to create truly strong types. It might be an overkill do always do that, but there are cases where this option is great.

Example: using patterns

I will use a function to pick numbers in a specified interval, from this answer

A number of built-in functions were overloaded on Cache data type, using UpValues. In this way, we can use familiar function names without a danger to affect other functionality in the system

The technical part of this implementation is fairly simple. We use the fact that associations are ordered, and when we add a new key-value pair, it is added at the end. The cache is supposed to store n most recent values. To do that, it works as follows: when a value is requested from the cache, and is present there, it moves it at the end (adding the same value again - it is O(1) operation). When we grow the cache to its full capacity, it starts removing key - value pairs from the start. The only tricky part was to have such removal as a fast operation. As nicely pointed out by Mr.Wizard in comments, Rest is O(1), so we use it. Previously, I missed this observation on Rest and used a user-defined analog of Rest here. Note that Delete and Drop on an association are O(n) even for the first positions).

More resources

[
How can this confetti code be improved to include shadows and gravity?
][15]

Summary

Introducing some level of typing is a very useful technique to improve the robustness of your code.

A simpler way to do that is to introduce patterns which expressions belonging to some type should match, and then insert argument checks based on these patterns, into the definitions of those functions which work with these objects. This has an advantage of being simple and quick to do, but a disadvantage that the types introduced in this way won't be in general fully strong and robust.

A somewhat more formal way is to introduce a special head for the type, and then methods working on that head. This is somewhat harder to implement than the first option, but has several advantages: the code is typically more robust and also usually ends up easier to read and understand.

The problem with Global` variables is that they are global... :) That is, if you open draw[] twice, it will share state and interfere with each copy.

Even if your GUI will not support multiple instances / prevent its existance, it is good to wrap definitions with private context by using standard BeginPackage + Begin["`Private`"] to not run out of symbols for multiple projects / other people projects.

At the end you could use $CellContext [1] but it is not convenient, out of the box, requires deeper understanding and slightly adapted setup to work so untill you know what is the end shape of your gui, it is not worth for beginners.

The text in the post says: "Below is the code of a rudimentary viewer for a DOM structure of an HTML page". So view shows the "skeleton" of the web page DOM structure, which is occasionally handy. It was never claimed in the post that view would allow one to actually view a web page in a way similar to the web browser experience. This would've been a vastly harder problem to solve in WL, and also rather useless - if one needs to view a web page, one should just use the browser.