Recovery strategy for user-controlled keys for self-sovereign identity

By Glenn Willen

Self-sovereign identity requires that users retain ultimate control of their own authenticators -- in a cryptography-based system, this means cryptographic keys. This is problematic, because even highly-skilled end users have trouble maintaining the security posture required for safe key storage, as demonstrated by Bitcoin hacks -- now that stolen keys are worth money, we’re suddenly seeing a lot more keys get stolen, even from large companies with IT and security departments. Obviously this is not tenable for an individual user.

I think a two-pronged strategy is appropriate here: (1) help the user store and use their own keys safely and securely, and (2) help the user recover their keys in case of loss or theft. I separate these because I am convinced that the best solution to the second is going to be a slow offline recovery procedure, which is incompatible with day-to-day use of key material. In any system for storing cryptographic material, there is always a balance between availability -- ensuring that keys can always be used by authorized users -- and confidentiality -- ensuring that they cannot be used by unauthorized users. If the recovery procedure is allowed to be slow, with plenty of time for notification to the key owner, then we can bias the recovery procedure towards availability (and the day-to-day storage towards confidentiality).

I will focus here on part (2), key recovery. In my view the most fundamental way of identifying an individual is not via something we know, which can be forgotten, or something we have, which can be lost; or even the traditional “something we are”, biometrics, which can be compromised. Instead, I claim the most fundamental way of identifying an individual is via their connections to other people. Facebook has already introduced friend-based account recovery to a huge audience with “Trusted Contacts” -- see https://www.facebook.com/notes/facebook-security/introducing-trusted-contacts/10151362774980766/. Although I’ve liked this idea for a long time, the introduction of Trusted Contacts shows promise that people would accept the idea of their contacts collectively having ultimate control of their digital identity.

To implement friend-based key recovery, an obvious choice is to use some K-of-N secret sharing scheme. With arbitrary data, we can use SSSS (https://en.wikipedia.org/wiki/Shamir's_Secret_Sharing). Depending on the specific secrets involved, we may have better options -- with Bitcoin script-based keys, we can use the built-in multisignature system, while also having flexibility to require a time- or block-based notice period before recovery, for example.

In such a system, we must ensure the probability is negligible that too many people lose their keys all at once and prevent reasonable recovery. A key aspect of ensuring that recovery is possible is ensuring that failures are detected promptly. Imagine a simple 1-of-N system wherein failure of a ‘friend shard’ is a poisson process -- fixed probability per unit time. The probability of losing enough shards to prevent recovery is highly dependent on the time to notice a failure. If a shard has a 90% probability of failure per year (17.5% per month), then the probability of losing three shards over a year is 72.9% -- a dismal figure. But if we reliably detect and repair failure of a shard within a month, then the probability of total failure each month is about half a percent, and the total yearly chance of failure is only about 6.2% -- hugely better!

From this back of the envelope computation, I take away two lessons:

  1. On the meta level: A system like this MUST be analyzed quantitatively to determine whether its properties are reasonable. You can’t easily eyeball the system description above, and correctly conclude that monthly checks are safe even though yearly checks are disastrous. Arguing about what is a sufficiently safe way to store keys can only be done with math.

  2. On the object level: It’s hugely important for key backups to be stored in a way that allows fast verification that they’re still alive. This makes a HUGE difference to durability; that in turn means that keys can be sharded with significantly higher thresholds for reassembly, which improves confidentiality (protection from compromise).

To maximize the aggregate probability of success in maintaining both availability and confidentiality, beyond carefully selecting the parameters of the system -- number of shards, reassembly threshold, frequency of verification -- we also want to maximize the availability and confidentiality properties of the individual shards. We do not need, and in fact do not desire immediate online access to the shards, and we DO require other properties such as rapid discovery of loss or compromise. The best way to ensure these things would seem to be a hardware device to store key shards and enforce the properties we want.1 It needs to frequently broadcast to the outside world, to provide positive confirmation that the secret data remains intact, but should not be able to transmit the secret data itself under any circumstances. If the secret data is a key, this could be done by signing a timestamp with it; stronger proof of availability could be enforced by signing nonces, but this would require two-way communication with the outside world, which is less safe against compromise than a one-way broadcast. When commanded to perform a recovery the device must stop broadcasting the all-clear, and instead broadcast the fact that the key shard has been released (and the key owner must presume that the absence of any broadcast at all might also mean that the key shard is compromised.)

A huge number of open questions remain, but I see this as a productive way forward in answering the question of how self-sovereign identity might be maintained safely and securely with cryptographic keys, despite the inability of typical users to maintain any reasonable level of computer security.

Footnotes

[1] I am indebted to Greg Maxwell for quite a bit of discussion about how such a device might be designed, but all errors and poor design choices herein are my own.