Thoughts on key rotation.

Thoughts on key rotation.

by Peter Bulmer -
Number of replies: 8
Current moodle 1.8 /1.9 key rotation seems to work like this:
* Our moodle creates a keypair, and sets expiry time for (default) 28 days.
* 28 Days later, during the first cron run following the expiry of this keypair, generate a new keypair, and archive the old keypair. We keep (default) 3 old key pairs.
* If another node encrypts something to us using one of the expired keypairs, reply with ERR_OLD_KEY, and supply the new public key. I presume that this message is returned signed, and it's signed using the old key that other node knows about. Assuming they haven't also rotated their key since we last sent them a message, they'll be able to read the reply, and update their record of our public key before repeating their original request.

IMO, this has the following weaknesses:
1. In the time between the old keypair expiring, and the next cron run, there is no valid, current-dated keypair.
2. Sometime between the time that the new key is generated, and 74 days later (default 3 archived keys * default 28 days), a node contacts us using an expired key, and we expect that node to trust the signature of the same (expired) key, to record the new key that it should be trusting.
3. If active communication between nodes is quite rare, it's likely that both nodes will have changed their keypairs since they last communicated, at the very least both sides will need to initiate a message so that the other may respond with ERR_OLD_KEY & specify their new key, before communication and trust across the network is restored.

My initial suggestions for unreleased moodles is to:
a) Only archive valid-&-unexpired-keys,
b) 1/3 of the way through the valid period of the current key, or after we have been using the current key for 1/3 of the currently configured key rotation period (whichever comes first), create a new key and archive the old one.
c) notify all configured network peers of our new key (using a new xmlrpc service). retry all unsuccessful notifications a number of hours later, using an incremental backoff.
d) increase the default valid period for a keypair to 3 months.

To make the migration easier for released moodles, I'd suggest we make the following concessions:
a2) Archive the greater of (# of valid & unexpired keys, or #currently configured archive number (3)), prefering to keep unexpired keys, then newer keys when choosing which ones to throw out.
c2) Retain the ability to respond to a message encrypted to one of our archived keys with ERR_OLD_KEY and the new key. When an incrementally backed off reattempt to notify a peer about our new key is about to occur, check to see if the peer has already communicated with us using our new key (retreived using old or manual methods).

I'd also recommend the ability for the administrator to do the following:
i) perform early key rotation
ii) specify a key to distrust, either the current key, or a key in history, causing network nodes using that key to require manual entry of our public key to re-join the network.
iii) manually reset the new key notification for a specific peer, causing the next cron run to try notifying that peer, and starting the backoff from scratch in the event of a failure.

As I see it, this requires keeping the following new information:
A) An id in a sequence for each keypair we generate
B) The latest keypair of ours, which we know that each peer knows about
C) Notification backoff information - when we last tried to tell a peer about our new key, and when we should try again.

I'm interested in your thoughts on the above, corrections to my understanding of keyrotation, improvements to add to the mix, etc etc.

Disclaimer: Based on non-complete investigation of the codebase I did a while ago, AFAIK, YMMV, May contain traces of nuts.

Pete Bulmer
Average of ratings: -
In reply to Peter Bulmer

Re: Thoughts on key rotation.

by Nigel McNie -
"I presume that this message is returned signed, and it's signed using the old key that other node knows about."

Yes, the patch I made for this a few months ago does that. It might even be encrypted using the old key too, though take that with a grain of salt.

"b) 1/3 of the way through the valid period of the current key, or after we have been using the current key for 1/3 of the currently configured key rotation period (whichever comes first), create a new key and archive the old one."

For simplicity, I'd ditch a configurable key rotation period. It's just noise and added complexity.

Same with incremental backoff - just try re-sending the key once a day or something. That level of traffic is low enough as to be meaningless.

"a2) Archive the greater of (# of valid & unexpired keys, or #currently configured archive number (3)), prefering to keep unexpired keys, then newer keys when choosing which ones to throw out."

Ditch the configured archive number, for the same reason.

Basically, I feel that the more configurable you make it, the more likely it is to break.

I'll be attempting to produce a patch for this in the next few days, we will see if the plan works in practice smile
In reply to Nigel McNie

Re: Thoughts on key rotation.

by Peter Bulmer -
"It might even be encrypted using the old key too, though take that with a grain of salt."

I'm also working under the belief that it's encrypted, but of course if it's encrypted, I'd hope that it's encrypted using the remote host's public key, not ours ;)

"For simplicity, I'd ditch a configurable key rotation period."
I agree that letting admins play with this setting is a bad idea, but creating keys with 6 minute expiry times sure makes testing key rotation easier.

re: Incremental backoff.
Fair enough.

re: archive number: remove configurability? sure.
Remove the interim provision for keeping expired keys? I don't think is such a good idea.
Without this, it makes the transition for moodle 1.9's rather bumpy. Without the old expired keys around for a little while, any time you upgrade one of the nodes in a moodle network to 1.9+patch, it loses contact with all of it's peers on older 1.9s.
The last peer on the network will upgrade fine, the first will be a disaster.

"Basically, I feel that the more configurable you make it, the more likely it is to break."
I agree ... mostly.

The more configurable you make it, the greater the opportunity for misconfiguration. I think this is different to the system breaking.

If the knobs are there, but there is no moodle interface for them (eg valid key period setting) then this creates a suitable barrier to entry - if you don't know how to play in the database - you shouldn't be playing.
In reply to Peter Bulmer

Re: Thoughts on key rotation.

by Nigel McNie -
I don't think I ever said to remove the interim provision wink. Yes that should stick around. I just think you should store the three most recent valid keys, full stop.

If you make a knob for this, someone will twist it, thus you have to handle max/min etc... all just useless noise in the way of the code, when the reality is that with three keys things should always work.

Anyways, I recognise over-thinking something too soon when I see it. Let's get a patch working and re-evaluate at that point approve
In reply to Peter Bulmer

Re: Thoughts on key rotation.

by Martín Langhoff -
Good analysis. I think the plan makes sense. Keeping around several current keys means changes in the DB tables to split off the keys and break the 1:1 relationship they have now with hosts.

And I agree with Nigel: reducing config options==good smile
In reply to Martín Langhoff

Re: Thoughts on key rotation.

by Peter Bulmer -
"Keeping around several current keys means changes in the DB tables to split off the keys and break the 1:1 relationship they have now with hosts."

I think you're reading more into what I'm saying than was meant.

Moodle already holds a key history for itself, when a keypair expires, it is put in history, and pushes out the oldest of the three currently in there.

For a remote host to get our new public key, it needs to trust our old (expired) key. In practice, there isn't anything horrendously wrong with this, as the keys we're using are probabaly strong, and expire at an over-zealous rate, but the theory stinks. An expired key should not be trusted.
If you go to your bank's website, and your browser says: "Expired certificate, otherwise valid.". Would you punch in your details?

What I'm advocating is a longer key expiry time, rotating keys well before they expire, and dropping support for moodles that would trust ancient, long since rotated-out keys.

In practice (for default settings) this is little different, but it makes the decision much easier - "when do I stop trusting other moodles' expired keys?" After we've had this new practice in place for a while, the answer is easy - 'as soon as they expire'
In reply to Peter Bulmer

Re: Thoughts on key rotation.

by Martín Langhoff -
Ah, ok.

The nasty thing in the current model is that we have 2 stages: 'current' and 'kind-of-expired' -- that kind-of-expired stage has a never-ending half life.

IIUC what you are proposing is that we formalise it into current-shiniest-newest, current-but-not-shiniest and expired; where the expired stage is _really_ expired (and can therefore be deleted).

I like it - the mechanics are similar to the current scheme, but the half-life is strictly controlled.
In reply to Martín Langhoff

Re: Thoughts on key rotation.

by Peter Bulmer -
Exactly, although if I may be pedantic, I'd name them:

"Current": Trustworthy, unexpired and completely valid. All normal communication can occur.

"Old": Still trustworthy, still unexpired, still valid. But the only response you'll get when encrypting content to us using it is "ERR_OLD_KEY, here's my new key".

"Expired": Expired, cyptographically valid, we could decrpyt what you sent us if we hadn't deleted the private key. Untrustworthy. Only response you'll get when you encrypt content to us using this key is ERR_DONT_UNDERSTAND.

After we have had the Current/Old/Expired regime in place for a while, we should distrust peers whose key becomes expired, requiring manual re-seeding if the peers administrator wants to get it going again. But that's a change for later.

I guess what I'm saying is that
a) it should be part of the mnet protocol that a node must use valid and unexpired keys
and
b) it should be part of a future version of the mnet protocol that a node require its peers to use valid and unexpired keys.
Average of ratings: Useful (1)
In reply to Peter Bulmer

Re: Thoughts on key rotation.

by Martín Langhoff -
Sounds great to me. One configuration that might be very welcome is to make (b) switchable like:

"Interoperate with old mnet sites at the expense of security? [yes/no/icecream]"

So the decision for the Moodle release managers / maintainers becomes one of "should we change the default?". And if the patch is workable for 1.8/1.9, we can merge it into 18_STABLE and 19_STABLE or at least offer it for admins that care.