[comments-root-zone-consultation-08mar13]

Thoughts on root-zone key rollover

To: comments-root-zone-consultation-08mar13@xxxxxxxxx

Subject: Thoughts on root-zone key rollover

From: Olaf Kolkman <olaf@xxxxxxxxxxxx>

Date: Sat, 13 Apr 2013 07:11:53 +0200

Hereby we like to share some thoughts about root-zone key rollover. The section
numbers below do not map 1-to-1 to the questions asked in the consultation.
=== 1. Key Signing Key rollover necessity.
There are a number of risks associated with performing a rollover and
a number of risk when not doing it.
When we choose not to roll that decision comes in two scenarios:
scenario 1. Never change the key, ever. Even when the private key is
compromised through theft or factoring signatures will
continue to be generated and no change will occur.
scenario 2. 'Cold Boot' or abrupt change (rather than a gracious roll)
when the private key has been compromised.
The difference between the two scenarios is that in the first case we
allow the system to continue with weakened security and likely evolve
quickly to a state where there is no added benefit of having DNSSEC.
In the second case we have to accept that there are a number of
devices for which the public key component cannot be upgraded or
removed, and for which DNSSEC cannot be turned off in a timely
fashion. Those devices will be useless from that time on. For all
other devices the change will be disruptive and we consider it likely
that DNSSEC will be left off for those devices.
Any of these scenarios will cause severe degradation, if not
disruption of trust.
If we choose to perform rollovers than we have two choices as well.
scenario 3. Only perform a roll when needed i.e. when cryptographic or
computing advances are such that cryptanalysis attack is
likely to be successful, or the key is known to be
compromised.
scenario 4. Rolling on a regular basis.
The risk with rolling at the moment a roll is needed (scenario 3) is
that the environment ca become ossified. Similarly like the case of
key-change (scenario 2) there will be entities that have no mechanisms
available because the need for those mechanisms where not obvious or
take a long time. Hence there was no (economic) motivation to
implement more expedient mechanisms.
That means that this scenario will also be highly disruptive, at a
point where perhaps the user population has moved from innovative
first movers to the laggards on the deployment S-Curve.
Scenario 4 also brings risks. A key-rollover will be disruptive if it
comes unexpected and the methodology to replace keys has not been
implemented. However, by performing regular rollovers it will be clear
to anybody who implements and maintains DNSSEC that key roll-over is a
part of the technology for which one cannot delay implementation.
Apart from al these considerations, it is important to communicate the
key-rollover clearly. Fortunately we are still in the early stages of
deployment and people who deploy validation are mostly innovators who
are likely to have deeper understanding of the technology and keep a
closer watch at the status of technology. In other words, it is better
to take the risk of disruption earlier since the odds are higher that
the early deployers can and are willing to cope.
=== 2. Regular key rollover
RFC 6781 speaks to the regularity of key rollovers and uses
'operational habit' as one of the motivations for regular rollovers.
We quote 3.2.2 in full since these arguments apply directly (and
specifically) to the root zone:
3.2.2. Rolling a KSK That Is a Trust Anchor
The same operational concerns apply to the rollover of KSKs that are
used as trust anchors: If a trust anchor replacement is done
incorrectly, the entire domain that the trust anchor covers will
become Bogus until the trust anchor is corrected.
In a large number of cases, it will be safe to work from the
assumption that one's keys are not in use as trust anchors. If a
zone administrator publishes a DNSSEC signing policy and/or a DNSSEC
practice statement [DNSSEC-DPS], that policy or statement should be
explicit regarding whether or not the existence of trust anchors will
be taken into account. There may be cases where local policies
enforce the configuration of trust anchors on zones that are mission
critical (e.g., in enterprises where the trust anchor for the
enterprise domain is configured in the enterprise's validator). It
is expected that the zone administrators are aware of such
circumstances.
One can argue that because of the difficulty of getting all users of
a trust anchor to replace an old trust anchor with a new one, a KSK
that is a trust anchor should never be rolled unless it is known or
strongly suspected that the key has been compromised. In other
words, the costs of a KSK rollover are prohibitively high because
some users cannot be reached.
However, the "operational habit" argument also applies to trust
anchor reconfiguration at the clients' validators. If a short key
effectivity period is used and the trust anchor configuration has to
be revisited on a regular basis, the odds that the configuration
tends to be forgotten are smaller. In fact, the costs for those
users can be minimized by automating the rollover with RFC 5011
[RFC5011] and by rolling the key regularly (and advertising such) so
that the operators of validating resolvers will put the appropriate
mechanism in place to deal with these stability costs: In other
words, budget for these costs instead of incurring them unexpectedly.
It is therefore preferable to roll KSKs that are expected to be used
as trust anchors on a regular basis if and only if those rollovers
can be tracked using standardized (e.g., RFC 5011 [RFC5011])
mechanisms.
Operational Habit means that a person responsible can enter a date in
an agenda, or can instruct a colleague who is preceding in their job.
If the time between rollovers becomes to long the odds are that the
calendar becomes inconsistent or the job has moved on to yet another
colleague. To me this type of argument suggests rollovers in the order
of one per 1 or 2 years.
However, since the operational practice of a rollover needs to be
ingrained it seems that having a higher frequency for the first few
times is wiser.
=== 3. How to communicate.
As argued above it is important that to communicate that a
key-rollover will happen.
It is likely that some ccTLDs have direct contact with ISPs that
perform validation. By measuring DNS queries (see below) they may know
which resolvers in their environment use DNSSEC. ccTLDs could be a
useful channel for outreach.
The use of banners on popular tech-sites is a second way.
Obviously, the well known technical mailing-lists should be a target.
Network related ones lik OARC, Nanog, RIPE, Apricot, CENTR etc but
also more System Administration related ones, like USENIX lists. Other
channels might include CERT advisories, postings through Operating
Systems announcement list.
Postings to non-technical popular press would probably rise more
concern than understanding.
Building a network that can be used for future communication (about
any DPS issue) should be an explicit goal of the first rollovers.
=== 4. Measurements
A way to measure the success of a rollover is by measuring the
patterns of DNSKEY queries. If after a rollover the validation of the
DNSKEY RRset agains the new trust-anchor fails resolvers may probe for
the DNSKEY RR with a higher frequency than during regular operations.
For instance, the Unbound implementation has a so called bad-cache,
that is used to store 'bad results' with a much shorter TTL than the
DNSKEY RR.
Besides, the hosts from which DNSKEY RRs are requested are likely to
be validating resolvers. And although one cannot immediately infer size
of a user population behind a querying address source it may be
possible to correlate those addresses with other parameters to
establish the most active validating resolvers and reach out to them
during communication.
5. Final remarks.
Others have suggested rollovers that roll a key back into itself or
perform partial rollovers and measure the effect. Those type of ideas
are worth considering but probably need peer review.
We want to acknowledge Warren Kumari as a source of inspiration for
the 4 scenarios.
-- Olaf Kolkman & Jaap Akkerhuis
NLnet Labs

Attachment:
signature.ascDescription: Message signed with OpenPGP using GPGMail