Helios v4

Contrary to previous ideas, Helios v4 will *not* support mixnets, but its data structures will be ready to eventually support them.

NOT FINAL. THIS DOCUMENT IS IN PROGRESS AS OF August 29th 2012.

Introduction

Helios is a truly verifiable voting system, which means that:

Alice can verify that her vote was correctly captured,

all captured votes are displayed (in encrypted form) for all to see.

anyone can verify that the captured votes were correctly tallied.

This document specifies all data formats and the exact verification
protocols and algorithms. Using this document, it should be possible
for an able programmer to build a complete verification program in any
modern programming language. For the sake of concreteness, instead of
pseudo-code, we use Python (2.3 or above.)

This document covers Helios 4.0, due out for release in Fall 2012. The biggest changes from v3.0 are:

datatypes are more efficiently represented

proofs are more robust

Audit Data

All data for an election is easily accessible using simple HTTP GET
requests. The HTTP interface for accessing all data from a given
election is built so as to enable static storage of this data in a
simple filesystem made available over the web, to simplify long-term
robustness. Consider an election with election id <ELECTION_ID>. Assuming a host and prefix that we denote {HELIOS_HOST}, the election data structure, including the election public key,
questions, etc., can be obtained by requesting the following URL:

{HELIOS_HOST}/elections/<ELECTION_ID>

The list of voters, denoted <VOTER_LIST>, is available at:

{HELIOS_HOST}/elections/<ELECTION_ID>/voters/

Given this list, it is possible to extract individual voter identifiers, denoted <VOTER_ID>

The list of cast ballots is available at, with each ballot including the <VOTER_ID> that it corresponds to:

{HELIOS_HOST}/elections/<ELECTION_ID>/ballots/

This call will return ballots in chronological (oldest to newest) order, and takes optional parameters limit and after.

The list of all ballots cast by a voter is:

{HELIOS_HOST}/elections/<ELECTION_ID>/ballots/<VOTER_ID>/all

For convenience, the last cast ballot is:

{HELIOS_HOST}/elections/<ELECTION_ID>/ballots/<VOTER_ID>/last

The result of an election is available at:

{HELIOS_HOST}/elections/<ELECTION_ID>/result

Every
election has trustees (sometimes just one), and the list of trustees,
including each trustee's public key and PoK, decryption factor and proof
is at:

{HELIOS_HOST}/elections/<ELECTION_ID>/trustees/

NOT YET IMPLEMENTED:

While the trustee's robustness information (e.g. Lagrange coeff) is at:

We begin with a description of the data types and their representations. All data made available by Helios is in JavaScript Object Notation (JSON) format, with keys in alphabetical order and no extraneous whitespace
other than that specified by JSON. These two conditions are
particularly important, as hashing data structures will only yield the
same hash if the conditions are respected. An example of a JSON data
structure with these two conditions is:

Basic Cryptographic Datatypes

All large integers are represented as base64 strings. An El-Gamal public-key is then a dictionary
including the prime p, the primer-order q of its intended subgroup, the generator g, and the public-key value y (with keys in alphabetical order):

In Helios, all hash values are base-64 encoded, and the hashing algorithm is always SHA256:

Hash value example

{"election_hash": "c0D1TVR7vcIvQxuwfLXJHa5EtTHZGHpDKdulKdE1oxw"}

Voter

A single voter in Helios is represented using a few fields that identify the voter. This data structure has changed from prior versions of Helios in order to accommodate multiple types of users, not just users identified by email address.

Other fields may be present in a <VOTER> data structure, e.g. category. These do not affect the cryptographic processing, but if present, they become part of the hash of the voter list.

Protecting Voter Privacy

In
order to protect voter privacy, Helios can obfuscate the voter_id,
especially when that voter_id is an email address. This protection is
not meant to resist a powerful attacker with other knowledge about the
voter, but mostly to prevent activities such as email-address crawlers
for the purpose of spamming. In this case, a voter can be represented
with the field voter_id_hash replacing voter_id. The hash is SHA256 by
default, or specified as a prefix when it is a different hash:

Voter Aliases

In some elections,
it may be preferable to never reveal the identity of the voters. This
is particularly applicable when organizers are worried about votes being
decryptable in 30+ years, when cryptographic advances make today's
algorithms weaker. An election may thus publish only an ALIASED_VOTER:

short_name, name, and description
describe the election. The short name must be a few characters without a
space (almost like a database key), the name can be a long string, and
the description is an even longer description of the election.

cast_url indicates the URL where ballots for this election should be cast.

frozen_at indicates the timestamp at which this election was frozen. It remains null until the election is frozen.

openreg indicates whether voters can be added to the list after the election has started.

use_voter_aliases indicates whether this election aliases its voters.

uuid is a unique identifier for the election, and name is the election's name.

<ELGAMAL_PUBLIC_KEY> is, as detailed earlier, the JSON data structure that represents an El-Gamal public key.

<QUESTION_LIST> is a data structure that represents the list of questions and available answers to those questions.

which, in this case, contains two possible answers (alice and bob),
URLs that describe these answers in greater detail, the text of the
question, and a short name for the question. The parameter max indicates the maximum number of options that a voter can select, most often 1. The parameter min indicates the minimum number of options that a voter must select, most often 0. Note how, given that max and min should be small integers, they are in fact serialized as integers, not as strings. choice_type
indicates the kind of question, for now just approval (possibly with a
maximum number of choices). If max is null, then it's approval voting
for as many candidates as desired. tally_type indicates how the question is tallied, e.g. homomorphic or mixnet.

voters_hash is the hash of the list of voters for the election. The list of voters is a JSON array of <VOTER> data structures. For example, a list of voters might be:

Open Registration

Helios supports "open registration elections", when the election
administrator so desires. In those elections, the voter list is not set
ahead of time. In that case, an election data structure contains a null voters_hash, and sets openreg to true.

Election Fingerprint

Once an election is ready to be used for voting, the administrator freezes
the election, at which point Helios prevents changing any of the
question parameters and voter registration settings: an open election
remains an open election, and a closed election remains closed with a
fixed voter list. The frozen_at field then indicates the timestamp at which the election was frozen.

Such a frozen election can be designated by its Helios Election Fingerprint,
which is the hash of the JSON election data
structure (with properly alphabetized field names, as always).
Note how this fingerprint depends on the list of voters if the election
registration status is closed, but not if it is open. In any case, this
fingerprint does not depend on any cast vote or cast-vote hash.

Vote

A vote contains a list of encrypted answers, and a reference to the
election, both by ID (for convenience) and by hash (for integrity.) The
hash is the election fingerprint just described.

Each "encrypted answer" corresponds to one election question: it
contains a list of ciphertexts (one for each possible choice for that
question), a list of corresponding proofs that the ciphertext is
correctly formed, and an overall proof that all of the ciphertexts for
that election question, taken together, are correctly formed.

The value of max in overall_proof matches the value of max in the election's question definition.

For approval voting questions, the overall_proof is absent.

When a voter generates a ballot, Helios provides the ballot
fingerprint, which is the base64-encoding of the SHA256 hash of the <VOTE> data structure defined above.

Proofs

A zero-knowledge proof, denoted <ZK_PROOF_0..max>, is a transcript of a non-interactive proof that the corresponding ciphertext encodes an integer value between 0 and max. For the overall proof, the ciphertext whose value is being proven to be between 0 and max is the homomorphic sum (element-wise product) of the choices ciphertexts.

In Helios, all 0..max proofs are disjunctive proofs (CDS & CP), meaning that the transcript includes max+1 proofs, one for each possible value of the plaintext, 0 through max. The max+1
individual challenges must sum up to the single actual protocol
challenge, which ensures that one of the proofs is real (while the
others are simulated.)

<ZK_PROOF_0..max>

[<ZK_PROOF(0)>, <ZK_PROOF(1)>, ..., <ZK_PROOF(max)>]

A single ZK proof is then composed of three messages: the
commitment, the challenge, and the response. Since the proof is a
Chaum-Pedersen proof of a DDH tuple, the commitment is composed of two
values, A and B. Thus, a ZK proof is:

In Helios v4, the commitment is optional, since these types of proofs can be checked with just the challenge and response, which cuts down the size of a proof significantly. This is doable because the commitment values A and B should be recoverable as:

A = g^response / alpha^challengeB = y^response / (beta/g^m)^challenge

at which point those values can be used in the proof verification. Effectively, we do more computation in exchange for a much smaller proof, since A and B are in the full group, while challenge and response are in the subgroup.

Proof Robustness

In prior versions of Helios, the proofs were generated using Fiat-Shamir with very little context, which makes an individual proof easily exchangeable with another. This is bad for security proofs (and for actual security, not coincidentally). So, to generate a challenge in a Fiat-Shamir'ized proof, we now include a lot more context.

First, to generate a challenge in general, we now create a JSON data structure, with the same strictness as our other JSON data structures (alphabetized keys, no extra spaces), that contains all the fields we want for context.

In the Proof of Knowledge of a Secret Key, we include: election_uuid, trustee_email.

Ballot Audit Trail

When a voter chooses to audit their ballot, each encrypted answer
contains additional information concerning the actual selected choice
and the randomness used to encrypt each choice's ciphertext.
Specifically, the JSON structure for <VOTE_WITH_PLAINTEXTS> is as follows.

Result

The result of an election is represented using the <RESULT>
data structure. The proofs of the decryption are done at the Trustee
level. The result simply displays the count of votes for each candidate
within each question, in an array of arrays format.

Trustee

Even if there is only one keypair in the case of a simple election,
Helios v3 (in a departure from previous versions), represents every
election as having trustees. If there is only one trustee, that's fine,
but the data structure remains the same:

A Note on the Source Code in this Specification

In the rest of this document, we show how to verify various aspects
of a Helios election using Python code for concreteness and legibility.
We assume that certain data structures have been defined: election, vote, proof, disjunctive_proof,
and a few others, all of which correspond to collections of fields that
directly map to the JSON data structures described above. However, we
note that a verification program need not necessarily parse these JSON
strings into custom Python objects. It is perfectly acceptable to
extract individual fields when necessary.

In particular, in a number of cases, our sample code will call election.toJSON(), or vote.toJSON()
in order to re-convert the data structure to JSON so that it can be
hashed and checked for integrity. A verification program that handles
JSON strings directly without de-serializing them to Python objects
would obviously not need to re-serialize to JSON, either. The original
JSON provided by the Helios server hashes appropriately to the intended
values.

Verifying a Single Ballot

Recall the Chaum-Pedersen proof that a ciphertext (alpha,beta) under public key (y, (g,p,q)) is proven to encode the value m by proving knowledge of r, the randomness used to create the ciphertext, specifically that g, y, alpha, beta/g^m is a DDH tuple, noting that alpha = g^r and beta/g^m = y^r.

In a disjunctive proof that the ciphertext is the encryption of one value between 0 and max, all max+1
proof transcripts are checked, and the sum of the challenges is checked
against the expected challenge value. Since we use this proof in
non-interactive Fiat-Shamir form, we generate the expected challenge
value as SHA1(A0 + "," + B0 + "," + A1 + "," + B1 + ... + "Amax" + "," + Bmax) with A0, B0, A1, B1, ... ,Amax, Bmax in decimal form. (Ai and Bi are the components of the commitment for the i'th proof.)

Thus, to verify a <ZK_PROOF_0..max> on a <ELGAMAL_CIPHERTEXT>, the following steps are taken.

Thus, given <ELECTION> and a <VOTE>, the verification steps are as follows:

verify_vote(election, vote):

# check hash (remove the last character which is a useless '=')computed_hash = base64.b64encode(hash.new(election.toJSON()).digest())[:-1]if computed_hash != vote.election_hash: return False

# go through each encrypted answer by index, because we need the index# into the question array, too for figuring out election informationfor question_num in range(len(vote.answers)): encrypted_answer = vote.answers[question_num] question = election.questions[question_num]

Auditing/Spoiling a Single Ballot

Given a <VOTE_WITH_PLAINTEXTS> and a claimed
vote fingerprint, verification entails checking the fingerprint,
checking all of the proofs to make sure the ballot is well-formed, and
finally ensuring that the ballot actually encodes the claimed choices.

check that each cast ballot is correctly formed by verifying the proofs.

homomorphically compute the encrypted tallies

verify each trustee's partial decryption

combine
the partial decryptions and verify that those decryptions, the
homomorphic encrypted tallies, and the claimed plaintext results are
consistent.

In other words, the complete results of a verified election
includes: the election fingerprint, the list of ballot fingerprints, the
trustee decryption factors and proofs, and the final plaintext counts.
Any party who verifies the election should
re-publish all of these items, as they are meaningless without one
another. This is effectively a "re-tally".

Part of this re-tally requires checking a partial decryption proof, which
is almost the same, but not quite the same, as checking an encryption
proof with given randomness.

Given a ciphertext denoted (alpha,beta), and a trustee's private key x corresponding to his public key y, a partial decryption is:

dec_factor = alpha^x mod p.

The
trustee then provides a proof that (g, y, alpha, dec_factor) is a
proper DDH tuple, which yields a Chaum Pedersen proof of discrete log
equality. Verification proceeds as follows:

# keep track of running tallies, initialize at 0# again, assuming operator overloading for homomorphic additiontallies = [[0 for a in question.answers] for question in election.questions]

# go through each voter, check itfor voter in voters: if not verify_vote(election, voter.vote): return False # compute fingerprint vote_fingerprints.append(b64_sha(voter.vote.toJSON())) # update tallies, looping through questions and answers within them for question_num in range(len(election.questions)): for choice_num in range(len(election.questions[question_num].answers)): tallies[question_num][choice_num] = voter.vote.answers[question_num].choices[choice_num] + tallies[question_num][choice_num]

# now we have tallied everything in ciphertexts, we must verify proofsfor question_num in range(len(election.questions)): for choice_num in range(len(election.questions[question_num].answers)):

decryption_factor_combination = 1

for trustee_num in range(len(election.trustees)):

trustee = election.trustees[trustee_num]

# verify the tally for that choice within that question# check that it decrypts to the claimed result with the claimed proofif not verify_partial_decryption_proof(tallies[question_num][choice_num], trustee.decryption_factors[question_num][choice_num], trustee.decryption_proof[question_num][choice_num], trustee.public_key): return False