Validated Configurations with Ciris

Validated Configurations with Ciris

The need for configuration arises in almost every application, as we want to be able to run in different environments -- for example, local, testing, and production environments. Configurations are also used as a way to keep secrets, like passwords and keys, out of source code and version control. By having configurations as untyped structured data in files, we can change and override settings without having to recompile our software.

In this blog post, we'll take a look at configurations with configuration files, to see how we can make the loading process less error-prone, while overcoming obstacles with boilerplate, testing, and validation. We'll also identify when it's suitable to use Scala as a configuration language for improved compile-time safety, convenience, and flexibility; and more specifically, how Ciris helps out.

Configuration Files

Traditionally, configuration files, and libraries like Typesafe Config, have been used to load configurations. This involves writing your configuration file, declaring values and how they're loaded, and then writing very similar Scala code for loading that configuration. That kind of boilerplate code typically looks something along the lines of the following example.

This is a tedious, error-prone process that rarely sees any testing efforts. PureConfig (and other libraries, like Case Classy) were created to remove that boilerplate. Using macros and conventions, they inspect your configuration model (nested case classes) and generate the necessary configuration loading code. This eliminates a lot of errors typically associated with configuration loading. Following is an example of how you can load that very same configuration with PureConfig.

Encoding Validation

In both previous examples, we do not check whether our configurations are valid to use with our application. In the case of Typesafe Config, we hit a runtime exception if the key is missing or if the type conversion fails, and in PureConfig's case, we will instead get a ConfigReaderFailures. But in neither case do we care what values are being loaded, as long as they can be converted to the appropriate types. For example, we might require a key of certain length and that it only contains certain characters, the timeout needs to be positive, and the port must be a non-system port number (value in the inclusive range between 1024 and 65535).

You could write an additional validation step to ensure the configuration is valid after it has been loaded -- which can be tedious to write and requires testing. One could also argue that the types of the configuration values are too permissive: why use String for the key if you do not accept all String values? And why use an Int for timeout and port, if you only allow a limited subset of values?

We could write these custom types ourselves, including the validation logic, and tell PureConfig how to load them -- which would be tedious to write for many types and would require testing. Another alternative is to use refined, which allows you to do type-level refinements (apply predicates) to types. I found this approach so useful that I wrote a small integration between PureConfig and refined at the end of last year (see blog post), so that PureConfig can now load refined's types.

As you can see in the example above, refined already contains type aliases for many common refinement types, like PosInt (for Int values greater than zero). You can also easily define your own predicates, like the one for the key and port. The W here is a shorthand for shapeless' Witness: a way to encode literal-based singleton types (essentially, values on the type-level). If you're using Typelevel Scala with the -Yliteral-types flag, you can write values directly in the type declaration, without having to use Witness.

In many ways, think of configurations as user input -- would you happily accept any values provided to your application from its users? Probably not: you would validate the input, and sanitize it if possible. Think about configurations in the same way, except that the user here might happen to be a developer of the application. The key here, as discussed in the paper linked above, is to check that your configuration is valid as soon as possible, ideally at compile-time, or as soon as the application starts. We want to avoid situations where we're running the application and suddenly discover that configuration values are invalid or cannot be loaded -- or worse, continue running with an invalid configuration, not to discover issues until much later on.

Improving Compile-time Safety

We've now got a way to encode validation in the types of our configurations, and a boilerplate-free way of loading values of those types from configuration files -- is there still room for improvement? To answer that question, we first need to ask why we are using configuration files in the first place.

Whether you thought about it or not, the main reason for using configuration files is so that we can change settings without having to recompile the software. In my experience, most developers default to using configuration files, and almost always change values by pushing commits to a version control repository. This is followed by a new release of the software, either manually or via a continuous integration system. In scenarios like this, and in general when it's easy to change and release software (particularly when employing continuous deployment practices), configuration files are not used for the benefit of being able to change values without recompile.

In such cases, why are we not writing the configurations directly in source code? Christopher Vogt has written an excellent blog post (and given a presentation) on the subject. The tricky part here is managing values which need to be dynamic in the environment (like the port to bind) and are secret (like passwords and keys). Depending on your requirements and preferences, you more or less have two alternatives.

If you know which environments your application will run in, and what the configuration values will be in those environments, you can just include the configurations in your application code (if it has no secrets), or store, compile, and bundle the configuration separately. If you have a requirement that secrets shouldn't touch persistent storage, this might not be a feasible alternative. You might also appreciate the fact that all code relating to your application is in the same version control repository and gets compiled together, in which case this approach might not be suitable.

Alternatively, you can include the configuration in your application, but load secrets and values which need to be dynamic from the environment during runtime. This is necessary when configuration values cannot be determined beforehand -- because you do not know what environment your application will run in, or if you use a vault (like credstash, for example) or a configuration service (like ZooKeeper, for example) -- or if you prefer having your configuration together with your application code and in the same version control repository.

In this post, we'll only focus on the latter case. While it’s possible to not use any libraries in the latter case, loading values from the environment typically means dealing with: different environments and configuration sources, type conversions, error handling, and validation. This is where Ciris comes in: a small library, dependency-free at its core, helping you to deal with all of that more easily.

Introducing Ciris

Imagine for the moment that no part of your configuration is secret and that your application only ever runs in one environment. You can then just write your configuration in code.

You then realize that it's a bad idea to put the key in the source code, because source code can easily get into the wrong hands. You decide that you'll instead read an environment variable for the key. Since you want to make sure that your configuration is valid, you have used refinement types, so you'll have to make sure to check that the key conforms to the predicate. You would also welcome a helpful error message if the key is missing or invalid. This sounds like more work than it should be, so let's see how Ciris can help us.

Ciris method for loading configurations is loadConfig and it works in two steps: first define what to load, and then how to load the configuration. For reading a key from an environment variable, you can use env[ApiKey]("API_KEY") which reads the environment variable API_KEY as an ApiKey. Ciris has a refined integration in a separate module, so you just need to add an appropriate import. Loading the configuration is then just a function accepting the loaded values as arguments.

Ciris deals with type conversions, error handling, and error accumulation, so you can focus on your configuration. The loadConfig method returns an Either[ConfigErrors, T] instance back, where T is the result of your configuration loading function. You can retrieve the accumulated error messages by using messages on ConfigErrors.

If we decided that the port needs to be dynamic as well, we can simply make that change. In the example below, we are using prop to read the http.port system property for the port to use. As you can see, you are free to mix configuration sources as you please. While we are reading environment variables and system properties in these examples, you could just as well use sources for some configuration services or vaults.

You might recognize the similarities between loadConfig and ValidatedNel with an Apply instance from Cats. That's because it's more or less how loadConfig works behind the scenes, except Ciris has its own custom implementation in order to be dependency-free in the core module.

Multiple Environments

We still have to deal with multiple environments in our configuration, assuming there are differences between configurations, or how they are loaded, in the different environments. There are several ways you can do this with Ciris -- one way is to define an enumeration with enumeratum and load values of that enumeration. Let's say we want to use a default configuration when running the application locally, but want to keep the key and port dynamic in the other environments (testing and production). We start by defining an enumeration of the different environments.

We can use the withValue method to define a requirement on a configuration value in order to be able to load our configuration. It works just like loadConfig, except it wraps your loadConfig statements (think of it as flatMap, while loadConfig is map). If no environment was specified in the environment variable APP_ENV or if it was set to Local, we will use a default configuration. We'll load the configuration just like before for any other valid environments (testing and production).

An alternative to the above is to have multiple entrypoints (main methods) in your application, each running the application with different configuration loading code (or using a default configuration) for the respective environment. Depending on how packaging and running of your application looks like across different environments, this may or may not be a suitable solution. Note that it's very much possible to mix these approaches, and you should strive to find what works best in your case.

Testing Configurations

Writing your configurations in Scala means you have the flexibility to work with them as you want. You're no longer limited to what can be done with configuration files. Sharing configurations between your application and tests is also very straightforward -- simply make the configuration loading function (and the default configuration) available for the tests.

If you really want to unit test the configuration loading as well, you can do so with minor rewrites. Currently, we depend on some fixed configuration sources for environment variables and system properties (technically, system properties are mutable), but if we instead pass sources (ConfigSources) as arguments, we can read values from those sources using the read method.

The read method normally looks for an implicit ConfigSource to read from, which would have been perfect if we only used a single source. But since we have multiple sources here, we instead use read to redefine env and prop to read from the provided sources. ConfigReader[T] captures the ability to convert from String to T, where the String value has been read from a ConfigSource.

We'll then define a couple of helper methods for creating ConfigSources from key-value pairs. The ConfigSource type parameter is the type of keys the source can read, which is String for both environment variables and system properties. The ConfigKeyType is basically the name of the key that can be read, for example environment variable. Below we're using predefined instances in the ConfigKeyType companion object.

We can test our config method using different combinations of environment variables and system properties. Note that envs and props have the same type, so if you want to avoid using them interchangeably, you can define custom wrapper types for them. We'll leave that out here for sake of simplicity. I've found that it's not very common to read values from more than one ConfigSource, but as it's definitely possible, it can be worth making sure you do not mix them up.

Conclusion

In this blog post, we've seen how we can make the configuration loading process, with configuration files, less error-prone, by eliminating the boilerplate code with PureConfig, and encoding validation with refined -- seeing how the two libraries can work together seamlessly.

We've also identified cases where we can use Scala as a configuration language, seeing that it's particularly suitable in cases where it's easy to change and deploy software. We've introduced the challenge of loading configuration values from the environment and seen how Ciris can help you with that, letting you focus on the configuration. We've seen that Scala configurations can provide more compile-time safety and flexibility than traditional configurations with configuration files.