From my (admittedly limited) exposure to functional programming languages, such as Clojure, it seems that encapsulation of data has a less important role. Usually various native types such as maps or sets are the preferred currency of representing data, over objects. Furthermore, that data is generally immutable.

Fogus: Following that idea—some people are surprised by the fact that Clojure does not engage in data-hiding encapsulation on its types. Why did you decide to forgo data-hiding?

Hickey: Let’s be clear that Clojure strongly emphasizes programming to abstractions. At some point though, someone is going to need to have access to the data. And if you have a notion of “private”, you need corresponding notions of privilege and trust. And that adds a whole ton of complexity and little value, creates rigidity in a system, and often forces things to live in places they shouldn’t. This is in addition to the other losing that occurs when simple information is put into classes. To the extent the data is immutable, there is little harm that can come of providing access, other than that someone could come to depend upon something that might change. Well, okay, people do that all the time in real life, and when things change, they adapt. And if they are rational, they know when they make a decision based upon something that can change that they might in the future need to adapt. So, it’s a risk management decision, one I think programmers should be free to make. If people don’t have the sensibilities to desire to program to abstractions and to be wary of marrying implementation details, then they are never going to be good programmers.

Coming from the OO world, this seems to complicate some of the enshrined principles I've learned over the years. These include Information Hiding, the Law of Demeter and Uniform Access Principle, to name a few. The common thread being that encapsulation allows us to define an API for others to know what they should and shouldn't touch. In essence, creating a contract that allows for the maintainer of some code to freely make changes and refactorings without worrying about how it might introduce bugs into the consumer's code (Open/Closed principle). It also provides a clean, curated interface for other programmers to know which tools they can use to get at or build upon that data.

When the data is allowed to be directly accessed, that API contract is broken and all those encapsulation benefits seem to go away. Also, strictly immutable data seems to make passing around domain-specific structures (objects, structs, records) much less useful in the sense of representing a state and the set of actions that can be performed on that state.

How do functional codebases address these issues that seem to come up when the size of a codebase grows enormous such that APIs need to be defined and lots of developers are involved on working with specific parts of the system? Are there examples of this situation available that demonstrate how this is handled in these type of codebases?

You can define a formal interface without the notion of objects. Just create the function of the interface documenting them. Don't provide documentation for implementation details. You have just created an interface.
– Scara95Dec 14 '15 at 6:34

@Scara95 Doesn't that mean I'm having to do work to both implement the code for an interface and write enough documentation about it to warn the consumer what to do and what not to do? What if the code changes and the documentation becomes stale? I generally prefer self-documenting code for this reason.
– jameslkDec 14 '15 at 6:39

You have to document the interface anyway.
– Scara95Dec 14 '15 at 10:34

3

Also, strictly immutable data seems to make passing around domain-specific structures (objects, structs, records) much less useful in the sense of representing a state and the set of actions that can be performed on that state. Not really. The only thing that changes is that the changes end up on a new object. This is a huge win when it comes to reasoning about the code; passing mutable objects around means having to keep track of who might mutate them, a problem which scales up with the size of the code.
– DovalDec 14 '15 at 14:12

4 Answers
4

First of all, I'm going to second Sebastian's comments on what is functional proper, what is dynamic typing. More generally, Clojure is one flavor of functional language and community, and you shouldn't generalize too much based on it. I'll make some remarks from more of an ML/Haskell perspective.

As Basile mentions, the concept of access control does exist in ML/Haskell, and is often used. The "factoring" is a bit different from conventional OOP languages; in OOP the concept of a class plays simultaneously the role of type and module, whereas functional (and traditional procedural) languages treat these orthogonally.

Another point is that ML/Haskell are very heavy on generics with type erasure, and that this can be used to provide a different flavor of "information hiding" than OOP encapsulation. When a component only knows the type of a data item as a type parameter, that component can be safely handed values of that type, and yet it will be prevented from doing much with them because it doesn't know and cannot know their concrete type (there's no universal instanceof or runtime casting in these languages). This blog entry is one of my favorite introductory examples to these techniques.

Next: in the FP world it's very common to use transparent data structures as interfaces to opaque/encapsulated components. For example, interpreter patterns are very common in FP, where data structures are used as syntax trees that describe logic, and fed to code that "executes" them. State, properly said, then exists ephemerally when the interpreter runs that consumes the data structures. Also the interpreter's implementation can change as long as it still communicates with the clients in terms of the same data types.

Last and longest: encapsulation/information hiding is a technique, not an end. Let's think a bit about what it provides. Encapsulation is a technique for reconciling the contract and the implementation of a software unit. The typical situation is this: the system's implementation admits of values or states that, according to its contract, should not exist.

Once you look at it this way, we can point out that FP provides, in addition to encapsulation, a number of additional tools that can be used to the same end:

Immutability as the pervasive default. You can hand transparent data values to third party code. They cannot modify them and put them into invalid states. (Karl's answer makes this point.)

Sophisticated type systems with algebraic data types that allow you to finely control the structure of your types, without writing lots of code. By judiciously using these facilities you can often design types where "bad states" are just impossible. (Slogan: "Make illegal states unrepresentable.") Instead of using encapsulation to indirectly control the set of admissible states of a class, I'd rather just tell the compiler what those are and have it guarantee them for me!

Interpreter pattern, as mentioned already. One key to designing a good abstract syntax tree type is to:

Try and design the abstract syntax tree data type so that all values are "valid."

This F# "Designing with types" series makes for pretty decent reading on some of these topics, particularly #2. (It's where the "make illegal states unrepresentable" link from above comes from.) If you look closely, you'll note that in the second part they demonstrate how to use encapsulation to hide constructors and prevent clients from constructing invalid instances. As I said above, it is part of the toolkit!

I really cannot overstate the degree to which mutability causes problems in software. Many of the practices that are drummed into our heads are in compensation for problems that mutability causes. When you take mutability away, you don't need those practices as much.

When you have immutability, you know your data structure won't change out from under you unexpectedly during runtime, so you can make your own derivative data structures for your own use as you add features to your program. The original data structure doesn't need to know anything about these derivative data structures.

This means your base data structures tend to be extremely stable. New data structures sort of get derived from it around the edges as needed. It's really hard to explain until you've done a significant functional program. You just find yourself caring about privacy less and less, and thinking about creating durable generic public data structures more and more.

One thing I'd like to add is that immutable variable cause programmers to stick to distributed and scattered data structure, if there is a structure at all. All data are structured to create a logic group, for easy discovery and traversing, not for transportation. This is a logic progression you will make once you have done enough functional programming.
– XephonDec 14 '15 at 19:42

Clojure's tendency to just use hashes and primitives is not, in my opinion, parts of its functional heritage, but part of its dynamic heritage. I've seen similar tendencies in Python and Ruby (both object-oriented, imperative and dynamic, even though both have pretty good support for higher-order functions), but not in, say, Haskell (which is statically typed, but purely functional, with special constructs needed to escape immutability).

So the question you need to ask is not, how do functional languages handle big APIs, but how do dynamic languages do it. The answer is: good documentation and lots and lots of unit tests. Luckily, modern dynamic languages usually come with very good support for both; for example, both Python and Clojure have a way of embedding documentation in the code itself, not just comments.

About statically typed, (purely) functional languages, there is no (simple) way to carry a function with a datatype as in OO programming. So documentation matter anyway. The point is you don't need language support to define an interface.
– Scara95Dec 14 '15 at 11:03

5

@Scara95 Can you elaborate what you mean by "carry a function with a datatype"?
– Sebastian RedlDec 14 '15 at 12:37

For example, OCaml has modules defined by a collection of named abstract types and values (notably functions operating on these abstract types). So in a certain sense, Ocaml's modules are reifing APIs. Ocaml also has functors, which are transforming some modules into another one, thus providing generic programming. So modules are compositional.