Yet another dev blog

Menu

Using the Mega API, with Python examples!

Introduction

The new Mega has the great advantage of being built as a service that can be queried by any client through its API. That means that the community can build shiny new stunning software on top of Mega’s API and take advantage of its huge capabilites.

The Mega’s API is documented here, but since the project is still very young, some information might be missing if you want to develop your own client from scratch. Never mind, Mega had the great idea to open the source code of its website, so we have all that we need to start coding!

Let’s talk a little bit about the API itself first. It is based on a simple HTTP/JSON request-response scheme, which makes it really easy to use. Requests are made by POSTing the JSON payload to this URL:

https://g.api.mega.co.nz/cs?id=sequence_number[&sid=session_id]

Where sequence_number is a session-unique number incremented with each request, and session_id is a token identifying the user session.

We will only send one command per request, but we still need to put it in an array. The response is either a numeric error code or an array of per-command return objects (JSON-encoded). Since we only send one command, we will get back an array containing only one return object. Thus, we can write our first two functions.

We will use Python in all the following examples, because it’s a very nice language that allows to experiment things quickly (and because I wanted to learn Python. These are my first steps, so you may see some ugly and un-pythonic things… please share all your suggestions for improvements in the comments! The good news is that if you’re new to Python, you will likely understand all the code in this article without any problem ). We will use PyCrypto for all the crypto-related parts.

You will notice that I’m not doing any kind of error checking because I’m lazy to keep the examples as simple as possible. The imports are not included, but you will find them in the complete listing at the end of this article. In the following, we will often need to base64 encode/decode data, and to convert byte strings to arrays of 32 bit integers and vice versa (for encryption and hash calculation). The utility functions that deal with this work are also given in the complete listing.

Now, we are ready to start!

Logging in

First, we need to log in. This will give us a session token to include in all subsequent requests, and the master key used to encrypt all node-specific keys. According to the Mega’s developer guide:

Each user account uses a symmetric master key to ECB-encrypt all keys of the nodes it keeps in its own trees. This master key is stored on MEGA’s servers, encrypted with a hash derived from the user’s login password.

Each login starts a new session. For complete accounts, this involves the server generating a random session token and encrypting it to the user’s private key. The user password is considered verified if it successfully decrypts the private key, which then successfully decrypts the session token.

The aes_cbc_encrypt_a32 function is given in the complete listing at the end of this article, as well as the ones dealing with base64 encoding and conversion between strings and integer arrays. Now that we have computed the hash, we can call the us method of the API:

The decryption is done by simply concatening all the decrypted AES blocks (see decrypt_key() in Mega’s crypto.js). We are calling aes_cbc_decrypt_a32() but CBC doesn’t matter here, since we are encrypting only one block (4 * 32 = 128 bits) each time.

All the components are multiple precision integers (MPI), encoded as a string where the first two bytes are the length of the number in bits, and the following bytes are the number itself, in big endian order (see mpi2b() and b2mpi() in Mega’s rsa.js).

PyCrypto uses a blinding step that involves e, the public exponent of the RSA key, during the decryption. Since we don’t know e, we simply bypass this step by calling key._decrypt() from PyCrypto’s private API. The final sid is the base64 encoding of the first 43 characters of the decrypted csid (see api_getsid2() in Mega’s crypto.js).

We now have all that we need to query the API… so let’s get the list of our files!

Listing the files

MEGA’s filesystem uses the standard hierarchical file/folder paradigm. Each file and folder node points to a parent folder node, with the exception of three parent-less root folder nodes per user account – one for his personal files, one inbox for secure unauthenticated file delivery, and one rubbish bin.

Each general filesystem node (files/folders) has an encrypted attributes object attached to it, which typically contains just the filename, but will soon be used to transport user-to-user messages to augment MEGA’s secure online collaboration capabilities.

We can retrieve the list of all our nodes by calling the API f method:

files = api_req({'a': 'f','c': 1})

The result contains, for each node, the the following informations:

h: The ID of the node ;

p: The ID of the parent node (directory) ;

u: The owner of the node ;

t: The type of the node:

0: File

1: Directory

2: Special node: Root (“Cloud Drive”)

3: Special node: Inbox

4: Special node: Trash Bin

a: The attributes of the node. Currently only contains its name.

k: The key of the node (used to encrypt its content and its attributes) ;

s: The size of the node ;

ts: The time of the last modification of the node.

Let’s talk a little more about the key. As explained by the Mega developer’s guide:

All symmetric cryptographic operations are based on AES-128. It operates in cipher block chaining mode for the file and folder attribute blocks and in counter mode for the actual file data. Each file and each folder node uses its own randomly generated 128 bit key. File nodes use the same key for the attribute block and the file data, plus a 64 bit random counter start value and a 64 bit meta MAC to verify the file’s integrity.

So, for directory nodes, the key key is just a 128 bit AES key used to encrypt the attributes of the directory (for now, just its name). But for file nodes, key is 256 bits long and actually contains 3 components. If we see key as a list of 8 32 bit integers, then:

(key[4], key[5]) is the initialization vector for AES-CTR, that is, the upper 64 bit n of the counter start value used to encrypt the file contents. The lower 64 bit are starting at 0 and incrementing by 1 for each AES block of 16 bytes.

(key[6], key[7]) is a 64 bit meta-MAC m for file integrity.

Now, we have all the keys to list the names of our files! First, let’s write a function to decrypt file attributes. They are JSON-encoded (e.g. {‘n’: ‘filename.ext’}), prefixed with the string “MEGA” (MEGA{‘n’: ‘filename.ext’}):

Ta-dah! We are now able to list all our files, and decrypt their names.

Downloading a file

To download a file, we first need to get a temporary download URL for this file from the API. This is done with the g method of the API:

dl_url = api_req({'a': 'g','g': 1,'n': file['h']})['g']

A simple GET request on this URL will give us the encrypted file. We can either download the whole file first, and then decrypt it, or decrypt it on the fly during the download. The latter seems to be the best solution if we want to check the file’s integrity, since the MAC has to be computed chunk by chunk:

File integrity is verified using chunked CBC-MAC. Chunk sizes start at 128 KB and increase to 1 MB, which is a reasonable balance between space required to store the chunk MACs and the average overhead for integrity-checking partial reads.

According to the developer’s guide, chunk boundaries are located at the following positions:

The whole file MAC is obtained by applying the same algorithm to the resulting block MACs, with a start value of 0. The 64 bit meta-MAC is then defined as:

((bits 0-31 XOR bits 32-63) << 64) + (bits 64-95 XOR bits 96-127)

We now have all that we need to download a file, so… let’s go! The get_chunks() function is given in the complete listing. It simply gives the list of chunks for a given size, according to the specification discussed above. Since it actually returns a dict {chunk_start: chunk_length} of all the chunks, we need to iterate over it in sorted order.

We can then generate a random 128 bit AES key for the file, and the upper 64 bits of the counter start value (initialization vector). With these two values, we can encrypt the file and start the upload by simply POSTing the file contents to the upload URL!

The upload is done chunk by chunk, in order to compute on the fly the chunk MACs that we will need later to get the meta-MAC. To upload the chunk starting at offset x, we simply append /x to the upload URL.

Now that the upload is done, we have to actually create the new node on our filesystem. Notice that we saved the response of the POST to the upload URL: it is a completion handle that we will give to the API to create a new node corresponding to the completed upload.

This is done by calling the p method of the API. It requires:

The ID of the target node (the parent directory of our new node) ;

The completion handle discussed above ;

The type of the new node (0 for a file) ;

The attributes of the new node (for now, just its name), encrypted with the node key ;

The key of the node (encrypted with the master key), in the format discussed in the previous section, which means we need to XOR the key randomly generated above with the initialization vector and the meta-MAC.

So we first need two functions: one to encrypt the attributes (analogous to dec_attr() defined before), and the other to encrypt the key (similar to decrypt_key()):

The API confirms the creation of the new node by returning all the informations given in the previous section (“Listing the files”): ID, parent ID, owner, type, attributes, key, size and last modification time (creation time in our case). The new file now appears in the list of our files. We are all done!

Conclusion

We have seen that with a few lines of code, we can build our own Mega client pretty quickly. I’m currently working on a FUSE filesystem, to mount Mega on Linux, and will share it shortly on GitHub. But in the meantime, here is the complete listing for all the examples of this article. Hope you liked it!

Thanks! My password length is already a multiple of 4, so I didn’t notice that… By the way, we also need it for the email. In fact, we need it for any string we pass to str_to_a32… so I added the padding there.

Files links can be shared to non-mega users via links with an embedded encryption key. Does the API support these non-user accounts? Are these the “ephemeral accounts” specified in the API docs? If so, how would your example be modified since those users don’t have user/pass to authenticate with.

It’s just a matter of a parameter name in the API g method And it doesn’t involve the ephemeral accounts mentionned in the API docs, you don’t have to have an account at all (and no account is created for that).

Sorry for the late answer. I’m actually doing infile = urllib.urlopen(dl_url) and then chunk = infile.read(chunk_size), so I’m reading the file chunk by chunk (I have to do that in order to compute the meta-MAC, because it’s based on the chunk MACs. I could also download the whole file at once and then read it chunk by chunk to compute the meta-MAC, but it’s just easier to directly download it chunk by chunk).

Nice work!
I’m really looking forward to read your articles about fuse and mega. I might even try to find some time to learn Pyton so I can follow the code and test it properly.

To anyone trying out the code. Copying the code gave some easy to find html formating errors like <, but also a harder to find $amp;sid inside a string witch is valid code but gives an ( unhandled ..;o ) error from api_req( … ). I got -15 <> returned rather than the expected data.

Hi, you’re a genius, you make everything look really easy, I tried to pass “crypto.js” and “rsa.js” a python and the result was not good.
I was surprised that is also relatively manejabe with PHP, thanks for posting these articles.