Software Robustness and BIND 10

Introduction

We have been discussing exceptions on the BIND 10 developers mailing list. Exceptions are a technique used by most modern programming languages that allow you to alter the normal flow of programs in unusual cases.

My hope is that exceptions can be part of a larger strategy for increasing the robustness of BIND 10. I gave a talk about this at the T-DOSE conference in Eindhoven, the Netherlands, recently — this post is a summary of those ideas.

BIND 9 Software Quality

BIND 9 insures software quality by using a form of design-by-contract, by running an automated build/test suite, by coding standards, and by a software review policy. While the design and implementation of BIND 9 pre-dates my employment at ISC by many years, my understanding is that these practices were adopted, in part, to insure that there were few or no security problems with BIND 9. One major goal of BIND 9 was to eliminate the security problems of BIND 8, which had a lot of security issues.

These measures have for the most part worked. BIND 9 has worked as designed, although sometimes the design itself needed refinement!

BIND 9 Security Advisories

While these measures have mostly worked, there is one class of errors that has caused a lot of problems. These are programmer errors, caught by the design-by-contract mechanism.

The idea is that if a violation of programmer assumptions is discovered, the program is in an invalid state, and any further action is unsafe. This is true! What BIND 9 does in this case is exit the program.

Unfortunately, daemons that exit often make administrators unhappy. This is especially true if external users can cause this to happen — this turns a coder error into a denial of service (DoS) attack.

I had a look at the BIND security advisories, and discovered there have been 13 for BIND 9 in the past 9 years. Of these, 1 was a problem in a library that BIND 9 did not itself use unsafely. Of the remaining 12, 8 of them were DoS attacks caused when the software caught a programmer error.

2 in 3 security advisories for BIND 9 were caused by
Design-by-Contract program termination!

How BIND 10 Can Do Better

Programmer errors are impossible to prevent. The techniques in BIND 9 have done a good thing, firstly by minimizing the programmer errors, and secondly by detecting them and not letting the system continue on in an invalid state. Note that some people on the BIND 10 team think that BIND 9 got it wrong, but personally I think that BIND 9’s pseudo-design-by-contract is successful. In any case, we can do better.

Basically BIND 10 needs to minimize the effects of software errors. There are two main approaches to this, one uncontroversial, and one slightly controversial:

Multiple processes
BIND 10 is going to run multiple processes. So, for example, the process handling DNS queries is not the same process which handles zone transfers, which is not the same process which handles dynamic DNS (DDNS) zone updates.There are a number of benefits to this. It allows administrators to run only the components they need, resulting in a smaller memory footprint, and complexity — which is more secure and makes the system easier to understand. It also allows efficient scaling across multiple CPU cores.

In terms of robustness, it means that a failure of one piece of the system does not affect other pieces of the system. For example, one of the BIND 9 security advisories was because of an error in DDNS processing. In BIND 10, this may have resulted in downtime for the DDNS process, but would normally not have any impact on the query handling processes.

Note that in addition to this separation, a “boss” process will monitor the state of the system and try to restart failed components. This means that if a component does fail, we will also minimize the time that it is out of service.

Exceptions
Exceptions are provided in most modern programming languages. They allow errors to be passed from the part of the program where the error is discovered to the “best” place to handle the problem. Deep inside of a library it is usually not possible to know the context that the rest of the program is running in, so it is very difficult to know what the correct action is when an error occurs.My hope is that we will be able to use exceptions to handle design-by-contract and other programming errors in as graceful a way as possible. So, for example, if a query triggers a contract failure, it may be possible to drop the query and flush any state that was associated with it, rather than exit the process handling it.

This kind of handling requires a lot more attention to the specific error. We need to be careful to insure that we really understand to implications of a programmer error at any given point, and we will lean towards safety — meaning in a lot of cases we will exit the process, the same as in BIND 9.

Unlike multiple processes, there is no consensus on the BIND 10 team that exceptions will actually produce more robust software. It is an open question, which we will answer as development continues.

These techniques will reduce the impact of coding errors in BIND 10. In addition, other things we are doing should make BIND 10 even safer (such as using modern languages, or picking and using high-quality libraries rather than writing our own code). With luck, BIND 10 will have far fewer security advisories when it turns 9 years old!