by Lee Hutchinson
The news this past weekend was all Mega, Mega, Mega. We’ve covered the launch of the new “cyber locker” service (and I swear that’s the last time I’ll ever use the prefix “cyber”) and we’ve talked with Kim Dotcom himself, but the shiniest feature—the encryption methodology—has remained unexplained.
Mega has an entire page dedicated to explaining to developers how its API works, and the page contains some high-level details of what encryption methods the service uses and where they’re used. There are actually several different things going on, and it’s not as simple (or as secure) as it appears at first blush.
Block object storage
Mega’s documentation notes it uses a “hierarchical file/folder paradigm,” which is a fancy way of saying that it organizes your data into files and folders, just like your local file system. Every file or folder has an identifying data structure called a node (sort of analogous to an inode in a Unix-y file system) and every node has a parent node; in this way, the “file/folder” paradigm is maintained even if all the Mega service can see are flat encrypted blocks. There are three parentless root container nodes for each account—one for the root file folder, one for the inbox, and one for the trash.
Each node contains an attribute block and data blocks. The attribute block for the node currently is only used to contain the name of the data object associated with the node—the name of the file or folder, in other words—but the docs note that in the future more data can be squeezed into the attribute block, including user-to-user messages for different users to collaborate on files. The attribute and data blocks are both encrypted, separately, with AES-128.
The “Cryptography” section of the docs starts like this:
All symmetric cryptographic operations are based on AES-128. It operates in cipher block chaining mode for the file and folder attribute blocks and in counter mode for the actual file data. Each file and each folder node uses its own randomly generated 128 bit key. File nodes use the same key for the attribute block and the file data, plus a 64 bit random counter start value and a 64 bit meta MAC to verify the file’s integrity.
Files and folders, therefore, are encrypted with symmetric encryption. Symmetric encryption means the same key is used to encrypt and decrypt your data; this less computationally-intensive and easier to implement than asymmetric encryption, which we’ll get to in a moment.
AES-128 is a well-known and widely used block cipher. It works by applying a transformation to a fixed-length piece of data, with the exact nature of that transformation being determined by an encryption key. To decrypt, the process is applied in reverse, again using the same key. For the data stored in Mega, the encryption key used is generated for you at the time of sign-up and is itself encrypted using your account’s password.
Before we go any further, let’s stop to look at the potential implications. The key used to encrypt your Mega files and folders is stored on Mega’s servers, rather than on your local computer. It is telling that there appears to be no password recovery mechanism anywhere in the Mega or log-on screens, nor any method of changing your password in the user control panel. Because the master AES-128 key is encrypted using your password, remembering the password is vital. Losing it means you don’t just lose the ability to log on to the service—you lose the ability to decrypt your files, period.
There’s not just AES-128 encryption happening, though. Each user account also has a 2048-bit RSA key pair generated during sign-up. This is a separate asymmetric encryption method, and it’s employed to let users of the Mega service send messages and files securely to each other.
Without getting too far off in the weeds discussing asymmetric encryption, here’s the gist: rather than one key that encrypts and decrypts data, everyone gets a pair of keys, called a public key and a private key. Messages encrypted with the public key can only be decrypted with the private key, and vice-versa. The public key can be used by anyone to send a message that can only be read by the person who possesses the private key; on the other hand, the private key can be used to send a message that can be decrypted by anyone with the public key (this is useful when you need to positively identify who you are—assuming you’re the only person with access to your private key, anyone can verify that you’re you because they can read your messages using your public key). Presumably, users on the Mega service encrypt messages and files they send back and forth to each other with each other’s public keys.
The documentation is ambiguous on exactly how the RSA private key is kept secure:
In addition to the symmetric key, each user account has a 2048 bit RSA key pair to securely receive data. Its private component is stored encrypted with the user’s symmetric master key.
It’s annoyingly unclear if this means the RSA private key is encrypted using the user’s symmetric master key, or if the RSA private key is stored, encrypted, alongside the symmetric master key. It’s likely the former, though.
On the surface, things look relatively secure. Objects are encrypted with AES-128, and there’s separate encryption method for sending secure messages between users. Here’s where things start to fall down, though.
The biggest problem with Mega’s methods is the lack of entropy gathered in the generation of the RSA key pair. An encryption key needs to be difficult to guess, and so typically when one is generated it is created randomly by an algorithm. Computers, though, are notoriously non-random, and so the “random” numbers generated by their random number generation routines need to be supplemented by a factor called entropy. Entropy is “true” randomness, gathered by the computer from various sources—keypresses and mouse movements, or sound from a microphone, or any number of other things. Some modern CPUs even contain “true random number generators,” which use random atomic vibrations in the CPU as entropy sources.
“To strengthen the key, we have collected entropy from your mouse movements and keystroke timings.” So, wait—should I wiggle my mouse now, or is it too late?
Without adding entropy, the “random” primes generated by math.random for use as RSA keys are really only pseudo-random and can be guessed. The end result of this is that it is easier (not easy, but easier) to reverse-engineer a Mega user’s private RSA key than it should be. That means it’s easier to spoof the identity of a Mega user when sending messages or files.
Deduplication: Another quandary
There’s another issue besides identity spoofing: Mega’s terms of service contain the following puzzler:
8. Our service may automatically delete a piece of data you upload or give someone else access to where it determines that that data is an exact duplicate of original data already on our service. In that case, you will access that original data.
This sounds a lot like deduplication—only storing each unique chunk of data once to save storage space. The AES-128 encryption used for the node data blocks should ensure that every encrypted block is unique, even encrypted blocks made up of two copies of the same file. If Mega only sees encrypted data, which by definition is all completely unique, how then can they be “deduplicating” it? Is something fishy going on?
There is a lengthy discussion at Hacker News on the subject, which has a number of theories, including that Mega is using convergent encryption to identify non-unique blocks, or that the service’s CBC-MAC-based integrity checking is used as the basis for deduplication (though this doesn’t seem like it would work across accounts, since CBC-MAC uses the user’s encryption key, and the same block processed with two different keys would yield two different MACs).
Whatever the underlying method, the fact that block deduplication exists is a blow against the “see no evil” approach taken by Mega. By itself, a global method of identifying specific data doesn’t necessarily mean anything; however, the implication is that a uniquely identifiable thing can be derived from any given piece of data. This returns some burden to Mega—rather than throw up its hands and say that it has no idea what Mega users Alice or Bob have in their Mega accounts, there apparently is a way of telling whether or not Bob and Alice have the same file or files. If the MPAA gets wind that Bob is hosting a copy of The Hobbit: An Unexpectedly Long Movie in his Mega folder, and Alice also happens to have the same file in her Mega folder, it’s trivial to prove that Alice has the same file—in fact, the nature of deduplication means there’s some record of every deduplicated block, and therefore every other infringing user.
Why is it all like this?
On the other hand, the documentation and implementation have no small number of weaknesses and potential exploits. The RSA key pair generation process needs to be overhauled post-haste, and there needs to be some method of backing up and modifying a user’s encryption key.
The fact that encrypted data is not a total mystery to Mega is the most troubling issue. On one hand, the reason behind implementing a block-based data deduplication scheme is obvious: storage is cheap, but it’s not that cheap, and the distributed infrastructure providers supplying storage to Mega don’t have to waste space storing non-unique data—instead of 10,000 copies of The Hobbit, the service would only store a single copy, freeing up terabytes of space (though the scale and scope of the deduplication isn’t known yet, so this may be optimistic). On the other hand, even if the service doesn’t know those blocks of data happen to be The Hobbit, the service does know which users own those deduplicated blocks, and if one user is implicated, there’s proof against all the others, too.
The CTO of Mega, Mathias Ortman, had this to say during the launch press conference: “The encryption is open source. We expect the security community to take a long and hard look and comment on possible weaknesses.” It no doubt will, with a vengeance.