1. Two problems with Java

Problem 1. Java lacks containers and maps with heterogeneous values.

As expected, Java objects support fields with heterogeneous types. For example a Person object can have a “name” field of type String and an “age” field of type Integer. Logically the fields form a Map from field names (“name” and “age”) to field values of different types (a String and an Integer). Such a Map would contain heterogeneous value types. Despite this being a perfectly reasonable Map, Java’s type system does not support it.

In this contrived example you could (and should) just use a Person object, however in other circumstances heterogeneous maps are more convenient. Falling back to objects also doesn’t work when implementing reusable heterogeneous containers, because the set of keys is open. (Examples include Favorites in Effective Java item #29, and this article’s SymbolMap and SymbolBus.)

A static type system can never represent all correct programs, so we should expect to bump up against this from time to time. However the lack of support for maps with heterogeneous value types is an inconvenient omission. They are a natural computational construct, and many namespaces contain heterogeneous values, including elemental namespaces such as parameters, scopes, object fields, etc.

Named parameters (also called pass-by-name or keyword parameters) is a feature of PythonRuby, and C#, among other languages. When calling a function or method, named parameters give you the ability to specify a parameter’s value by name, instead of by position. This means you don’t have to remember the positional order of parameters, which frequently makes code easier to write, read, and refactor.

Optional parameters (also called default arguments or parameters with default values) are another language feature. When defining a function you have the ability to provide a default value for a parameter. If a parameter has a default value, then that parameter becomes optional. If a caller doesn’t provide a value for that parameter, the default value will be used.

Named and optional parameters have synergy: if a method’s 3rd and 4th parameters have default values, then named parameters enable you to provide a value for the 4th parameter without providing one for the 3rd.

2. Get and cast: Heterogeneous maps without type safety

The closest Java’s type system can get to map with heterogeneous value types is Map<String, Object>. Unfortunately Map<String, Object>.get() will return values of type Object, so we have to downcast them to their underlying types. This approach is demonstrated in TestGetAndCast. Instead of manually casting we use CastLib.get() which does it for us, with the convenience of generic method type inference.

This is nice and simple, but as shown in the test, the downcasts are not statically type safe. This isn’t the end of the world: dynamic typing works fine for Python, Ruby, Clojure, etc. However static types are a primary reason why I use Java, so it would be preferable to preserve this benefit. Fortunately we can outsmart the type system in order to implement type safe heterogeneous maps…

TypeMap provides a type safe mapping from type objects to heterogeneous values. The signature of put() ensures that values associated with a type object are instances of that type. This knowledge is reflected in the signature of get(): when we lookup by type object, we know that the returned value will be an instance of that type.

TypeMapClass implements TypeMap which is demonstrated in TestTypeMap. Behind the scenes we wrap a Map<Class<?>, Object> and do a downcast in the get() method, however the signature of put() guarantees that this cast is safe. I’ve modified the interface above by having put() return the map, so maps can be built fluently.

TypeMap achieves our goal of heterogeneous values with static type safety, however it has two problems:

TypeMap cannot contain multiple values of the same type. This is an unusual and undesirable restriction for a map.

TypeMap doesn’t quite work with parameterized value types such as List<Integer>. Although List.class is available at runtime, List<Integer>.class is not. This is presumably because Java has type erasure, where generic parameters are discarded at compile-time.

Fortunately we can solve these problems by using different keys…

4. Symbol: A key with a value type parameter

TypeMap’s technique for static type safety is based on having keys that are parameterized by value type. This parameter enables us to enforce value type during put() and it tells us the expected value type during get(). Type objects are convenient keys because they are built-in, however they also cause TypeMap’s two problems.

We can solve this by creating our own parameterized keys. A Symbol is a key with a value type parameter. In the next section we solve TypeMap’s two problems by using symbols as keys instead of type objects.

Symbols have name fields to make key errors meaningful. However symbols behave as entity objects, so two symbols with the same name are different keys. This behavior means that while symbol name conflicts would be confusing, they do not create bugs.

I use $ as a prefix for symbol variables, which I’ve found to be readable.

As shown in TestSymbolMap, the SymbolMap.Fluid is used as a fluent builder for SymbolMap.Solid. I use $ as a prefix for symbol variables, which I’ve found to be readable. After adding a static import from SymbolLib here’s the syntax you end up with:

(A quality alternative to SymbolBus is Guava’s EventBus. Like TypeMap, EventBus is a mapping from type to value, more specifically it’s a mapping from event type to subscriber set. So I believe it suffers from one of TypeMap’s problems: we cannot have multiple event streams containing the same event types.)

7. Using SymbolMap for named and optional parameters

As discussed, one of Java’s problems is the lack of support for named and optional parameters. However there are ways to mimic this behavior:

Use method overloading to provide default values. This is less flexible and more verbose than Python’s optional values. (It also doesn’t help with named parameters.)

Create a fluent interface with named methods for building up the state of an object. Default values can be defined for optional fields that are not set.

Accept a parameter map as an input. Default values can be defined for optional fields that are missing. This is a common practice in Javascript, and was the standard practice in Ruby before named parameters were added in 2.0. Parameter maps contain heterogeneous values, so this technique is less appealing in Java until you have SymbolMap (or a similar tool).

Here’s a StackOverflow post discussing other creative tricks of varying practicality (which are not addressed in this post).

ArticleWithSymbols is a class whose constructor accepts named parameters in the form of a SymbolMap.

These classes all support a form of default values, and #2-4 mimic named parameters. ArticleWithPositional provides default values using overloading. ArticleWithMutation and ArticleWithBuilder provide default values for optional fields whose setters are not called. ArticleWithSymbols provides default values when pulling values out of the SymbolMap using these convenience methods:

All four of these classes are clumsy compared to the elegance of real named and optional parameters. However these kinds of techniques seem to be the best we can do in Java. Deciding which to use, or which combination to use, is a case-by-case judgement call. Here are some of their tradeoffs:

ArticleWithPositional is the easiest to implement, but it doesn’t provide named parameters. Default values are provided by defining a constructor with fewer inputs. This is awkward because you have to provide a new method for every combination of default values you want to support, which may not even be possible if two optional parameters have the same type. On the positive side, overloading enables the compiler to statically verify that a client is providing a valid combination of parameters. An alternative or complement to defaults via overloading is defaults via null, where the constructor converts null values into default values.

ArticleWithMutation mimics named parameters using named setters, so it offers a better client experience. However it requires more implementation effort because we have to create and maintain a setter method for every parameter. We also cannot make the article immutable (mark the fields as final), even if we only intend for the article to be mutated during construction. It’s also possible for the article to be in an invalid state where required fields haven’t been set.

ArticleWithSymbols also allows you to define parameters by name, however it requires a bit less implementation effort than fluent setters: we only need a symbol per field instead of a setter method per field. We also get the benefits of ArticleWithBuilder without requiring a separate builder class: fields can be final and there’s no risk of invalid instances. Another important benefit is that parameter maps give us separation between configuration and creation. This kind of coupling reduction is generally desirable, and such separation can have practical benefits. As an example I’ve included a generic copy-with-mutations method which is demonstrated in TestArticleCreation. (Copy-with-mutations can be particularly useful when working with immutable objects.)

PropLib provides a .properties backed implementation of ConfigFormat, as demonstrated in TestPropLib. Using this ConfigFormat we can easily convert between the file content string and a Map<String, String>. That’s the easy part. The hard part emerges when we realize that we need heterogeneous types for values, not just strings. For example the value for "threadCount" needs to be the integer 4, not the string "4". The value for "thresholds" needs to be ImmutableList.of(10, 20, 30), not the string "10, 20, 30". And so forth.

You could manually parse config values wherever they are used, but this is painful. What we really want is to work with a SymbolMap instead of a Map<String, String>. That’s where the SymbolTranslator interface comes in, converting between what we have and what we want:

As demonstrated in TestFormatLib, FormatLib defines a bunch of common formats (such as intFormat, boolFlagFormat) and shows how easy it is to create your own. Format, FormatParser, and FormatWriter are “type aliases” that bind Syntax to String. Unfortunately Java doesn’t provide real type aliases, so I use subtypes as aliases. (This is theoretically an antipattern because subtyping creates new nominal types, which means that while Format<Model> can be used as a Translator<String, Model>, the reverse is unfortunately not true. However in practice the proliferation of parameters can make code unreadable, so I think this “antipattern” can be worth the tradeoff. I have no idea why Java doesn’t have real type aliases yet.)

Now that we know about these base interfaces, we can take a look at the interfaces used in TestSymbolFormat:

With the preliminaries out of the way, we can review TestSymbolFormat more carefully. The test demonstrates the steps for using SymbolFormat:

Define the symbols.

Use a builder to create a SymbolTranslator (associating each symbol with a value format).

Get a ConfigFormat. In the test we import one backed by java.util.Properties.

Create a SymbolFormat by chaining together the ConfigFormat and SymbolTranslator.

Use the SymbolFormat to convert between content strings and SymbolMaps.

Once we’ve got a SymbolMap we can directly use it in our application, or we can use it as a parameter map for creating an object.

SymbolFormat is extensible along two axes. The first is the ability to provide whatever value formats your application needs. The second is the ability to provide whatever ConfigFormat you application needs. We’ve used a .properties file as an example, but as mentioned, Map<String, String> is ubiquitous.

Although SymbolFormat could be used for object serialization, I don’t recommend it. Once you’re working with objects instead of SymbolMaps, Java provides more convenient serialization mechanisms:

The Serializable interfaces lets you write and parse binary. Once you’ve got a binary representation, you can choose to trade CPU for disk space and IO by zipping it.

9. SymbolSchema: Validating the symbols in SymbolMaps

A SymbolSchema can validate the symbols in a SymbolMap (its keys). This is occasionally useful as an assertion, especially when time elapses between the creation of a SymbolMap and its usage.

SymbolSchemaClass implements SymbolSchema which is demonstrated in TestSymbolSchema. A SymbolSchema contains a set of required symbols and a set of optional symbols. It can validate the keys of a SymbolMap, or any other set of symbols. This validation detects: