Defining a schema for an app is quite simple. Let’s assume we’re building an app that returns the favourite city of a person:

/*Appsampleoutput*/{"city":"Madrid"}

The schema of this output will just be a JSON object with city as a key, and the maximum number of bytes of its value. Different cities have different lengths, so with this we’re setting up a cap, so that, e.g., a JPG image can’t be serialized and disguised as a city. The objective is to avoid unsolicited data leaks in an app’s output. Thus, the schema will be:

/*Appschema*/{"city":30}

With this, we’ve set a limit of 30 bytes for the value of city key. As the string "Madrid" has a JSON-stringified length of 8 bytes, we’re within the 30 bytes limit, which means the app’s output is valid for this schema.

Let’s consider a more complex example. Now the app returns more detailed information about the city, incluing the country to disambiguate between, e.g., Madrid, Spain, and Madrid, USA.

/*Appsampleoutput*/{"city":{"name":"Madrid","country":"Spain"}}

For this type of output, the same approach would apply. We’ll assume countries will have a maximum JSON-stringified length of 30 bytes. The app schema would be:

/*Appschema*/{"city":{"name":30,"country":30}}

Finally, let’s assume an app returns a list of up to 10 favourite cities. One sample output would be:

To build the schema, we’ll replace values with the maximum expected JSON-stringified length again. For the array, two elements are used: the first one is the expected schema of each item, and the last one is the maximum length of the array. As said, we’ll only support a maximum of 10 favourite cities, so the schema is as follows:

/*Appschema*/{"cities":[{"name":30,"country":30},10]}

It can be observed that there are no specific type checks. The objective of schema checks is not doing type checks, but to prevent data leaks. Therefore, the amount of allowed data is what only matters, as both numbers or strings can be disguised as different data types by using different encodings.

An advanced user would be able to observe that, given the app schema shown before, the app in question would be able to return a maximum of 10 x (30 + 30) = 600 bytes of data (including an overhead of double quotes). This measure is useful for the user to weight the amount of trust in the app and the amount of risk in the schema in order to decide whether to use the app or not.

As you can see, the string "Madrid" becomes "cFwfPaP3E/4tcryywWYEDN7go+pi1uTpA7jy7clI17KKO/nO0YuZ5vS3i7Ea9n/y3LOF4cajYQOAQt/lBwDMsA==". This is an encrypted, Base64-encoded version of the string "Madrid". Trying with a longer string produces a similar result:

The string "SanFrancisco" produces another Base64 string which has the same length as the previous one. As "SanFrancisco" is longer than "Madrid", this means that the encryption algorithm hides the real length of the unencrypted data.

The pkcs1 encryption algorithm being used provides an output whose length is a multiple of 64 bytes for every 22 bytes, which, after converting it to Base64, becomes even longer.

In order to estimate the length of the original, unencrypted data, the verifier reverse engineers the above formula. This lets verifiers estimate how many minimum bytes are actually being sent and detect those cases where there’s a clear excess of information, compared to the schema.

As verifiers reverse engineer the unencrypted data length, encryption doesn’t change the way schemas should be defined. Nevertheless, it’s important to notice that a verifier cannot really make a difference between an unencrypted length of 14 and a length of 16 (as both would produce Base64 strings that are equally long). However, verifiers would spot a leak if the unencrypted length being transmitted is, e.g., 80, as it would produce a much longer encrypted string.