Skip to content

Latest commit

 

History

History
2123 lines (1422 loc) · 124 KB

README.md

File metadata and controls

2123 lines (1422 loc) · 124 KB

ZK Bug Tracker

A community-maintained collection of bugs, vulnerabilities, and exploits in apps using ZK crypto.

There are two sections - Bugs in the Wild and Common Vulnerabilities. The Bugs in the Wild section is a list of actual bugs found in zk related codebases. The Common Vulnerabilities section outlines the common categories of zk related bugs that have been found. These lists can be used as a reference for developers, auditors, and security tool makers.

Contributing

If you would like to add a "bug in the wild" or a "common vulnerability", there are two ways to do so:

  1. Create a PR, filling in all of the necessary details yourself
  2. Create an issue with a link or description of the bug or common vulnerability. The repo maintainers will then fill out the relevant details in a PR.

Table of Contents

  1. Dark Forest v0.3: Missing Bit Length Check
  2. BigInt: Missing Bit Length Check
  3. Circom-Pairing: Missing Output Check Constraint
  4. Semaphore: Missing Smart Contract Range Check
  5. Zk-Kit: Missing Smart Contract Range Check
  6. Aztec 2.0: Missing Bit Length Check / Nondeterministic Nullifier
  7. Aztec Plonk Verifier: 0 Bug
  8. 0xPARC StealthDrop: Nondeterministic Nullifier
  9. a16z ZkDrops: Missing Nullifier Range Check
  10. MACI 1.0: Under-constrained Circuit
  11. Bulletproofs Paper: Frozen Heart
  12. PlonK: Frozen Heart
  13. Zcash: Trusted Setup Leak
  14. MiMC Hash: Assigned but not Constrained
  15. PSE & Scroll zkEVM: Missing Overflow Constraint
  16. PSE & Scroll zkEVM: Missing Constraint
  17. Dusk Network: Missing Blinding Factors
  18. EY Nightfall: Missing Nullifier Range Check
  19. Summa: Unconstrained Constants Assignemnt
  20. Polygon zkEVM: Missing Remainder Constraint
  21. Polygon zkEVM: Missing constraint in PIL leading to proving fake inclusion in the SMT
  22. Polygon zkEVM: Incorrect CTX assignation leading to addition of random amount of ether to the sequencer balance
  23. Polygon zkEVM: Missing constraint in PIL leading to execution flow hijak
  24. Zendoo: Missing Polynomial Normalization after Arithmetic Operations
  25. Aleo: Non-Committing Encryption Used in InputID::Private
  26. Light Protocol: Modification of shared state is not an atomic operation
  27. ZK Email: under-constrained circuit leads to email address spoofing
  1. Under-constrained Circuits
  2. Nondeterministic Circuits
  3. Arithmetic Over/Under Flows
  4. Mismatching Bit Lengths
  5. Unused Public Inputs Optimized Out
  6. Frozen Heart: Forging of Zero Knowledge Proofs
  7. Trusted Setup Leak
  8. Assigned but not Constrained

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits, 3. Arithmetic Over/Under Flows, 4. Mismatching Bit Lengths

Identified By: Daira Hopwood

A RangeProof circuit was used to prevent overflows. However, it used the CircomLib LessThan circuit to ensure this, which expects a maximum number of bits for each input. The inputs did not have any constraints on the number of bits, so an attacker could use large numbers to achieve a successful RangeProof even though the number was out of range.

Background

Dark Forest, a fully decentralized and real time strategy game, had a missing bit length check from its early circuits. In order to prevent underflows their circuit included a RangeProof template to ensure that an input is within bounds to prevent an overflow.

// From darkforest-v0.3/circuits/range_proof/circuit.circom
template RangeProof(bits, max_abs_value) {
    signal input in;

    component lowerBound = LessThan(bits);
    component upperBound = LessThan(bits);

    lowerBound.in[0] <== max_abs_value + in;
    lowerBound.in[1] <== 0;
    lowerBound.out === 0

    upperBound.in[0] <== 2 * max_abs_value;
    upperBound.in[1] <== max_abs_value + in;
    upperBound.out === 0
}

The LessThan template compares two inputs, and outputs 1 if the first input is less than the second input. In the RangeProof circuit, the LessThan circuit is essentially used as a GreaterEqThan, requiring that the output is 0. The LessThan template takes in the max number of bits for both inputs as a parameter, but does not actually check this constraint.

// From circomlib/circuits/comparators.circom
template LessThan(n) {
    assert(n <= 252);
    signal input in[2];
    signal output out;

    component n2b = Num2Bits(n+1);

    n2b.in <== in[0]+ (1<<n) - in[1];

    out <== 1-n2b.out[n];
}

The Vulnerability

Therefore, in the RangeProof example the LessThan circuit is used with an expected maximum number of bits, but the inputs max_abs_value and in are never constrained to that number. An attacker could input max_abs_value and in values that contain more bits than the expected max. Since LessThan is expecting the two inputs to have a maximum number of bits, it may output an incorrect result. An attacker would be able to satisfy the RangeProof even though the input is out of range.

The Fix

In order to prevent this attack, a check was needed on the number of bits of max_abs_value and in. They must be constrained to *bits *****(the template input parameter) number of bits. The following is the implemented fix in production:

// From darkforest-eth/circuits/range_proof/circuit.circom

// NB: RangeProof is inclusive.
// input: field element, whose abs is claimed to be <= than max_abs_value
// output: none
// also checks that both max and abs(in) are expressible in `bits` bits
template RangeProof(bits) {
    signal input in;
    signal input max_abs_value;

    /* check that both max and abs(in) are expressible in `bits` bits  */
    component n2b1 = Num2Bits(bits+1);
    n2b1.in <== in + (1 << bits);
    component n2b2 = Num2Bits(bits);
    n2b2.in <== max_abs_value;

    /* check that in + max is between 0 and 2*max */
    component lowerBound = LessThan(bits+1);
    component upperBound = LessThan(bits+1);

    lowerBound.in[0] <== max_abs_value + in;
    lowerBound.in[1] <== 0;
    lowerBound.out === 0

    upperBound.in[0] <== 2 * max_abs_value;
    upperBound.in[1] <== max_abs_value + in;
    upperBound.out === 0
}

References:

  1. ZKPs for Engineers: A look at the Dark Forest ZKPs (See the “Bonus 1: Range Proofs” section)
  2. Commit of the Fix

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits, 3. Arithmetic Over/Under Flows, 4. Mismatching Bit Lengths

Identified By: Andrew He and Veridise Team independently

The BigMod circuit, used for the modulo operation on big integers, was missing a bit length check on the output remainder. This constraint needs to be added to prevent an attacker from using an unexpectedly large remainder value. This can break a protocol in various ways, depending on how they use this circuit.

Background

The BigInt circuits are used to perform arithmetic on integers larger than the SNARK scalar field order. BigMod is the circuit responsible for performing the modulo operation on these large integers. BigMod takes inputs a and b, and outputs the quotient and remainder of a % b. The circuit uses a helper function, long_division, to calculate the quotient and remainder. However, functions can’t add constraints, so the BigMod circuit must add constraints to ensure that the quotient and remainder are correct.

Additionally, the BigInt circuits store big integers in two formats: proper representation and signed overflow representation. Proper representation does not allow for integers to be negative whereas the signed overflow representation does. Due to the representation style, the negative signed overflow numbers may have more bits than the proper representation style.

The Vulnerability

Two important constraints are ensuring that both the quotient and the remainder are the proper number of bits. There was a bit length check on the quotient, however there was no check for the remainder:

// From circom-ecdsa/circuits/bigint.circom before the fix

// Long division helper function. Outputs the quotient and remainder
var longdiv[2][100] = long_div(n, k, k, a, b);
for (var i = 0; i < k; i++) {
    div[i] <-- longdiv[0][i]; // Quotient
    mod[i] <-- longdiv[1][i]; // Remainder
}
div[k] <-- longdiv[0][k];

// Range check for the quotient
component range_checks[k + 1];
for (var i = 0; i <= k; i++) {
    range_checks[i] = Num2Bits(n);
    range_checks[i].in <== div[i];
}

Without a bit length constraint on the remainder, the output of this circuit was not guaranteed to be in proper representation. Only the quotient, div[], was constrained to n bits per register in the array. The remainder array, mod[], was not constrained to n bits. Therefore any consumers of this circuit are not guaranteed to have the remainder be accurate and as expected.

The Fix

In order to ensure that the remainder doesn’t contain too many bits and proceed to cause unexpected behavior from there, a bit length constraint must be added. The circuit was changed to incorporate this fix:

// From circom-ecdsa/circuits/bigint.circom after the fix

var longdiv[2][100] = long_div(n, k, k, a, b);
for (var i = 0; i < k; i++) {
    div[i] <-- longdiv[0][i];
    mod[i] <-- longdiv[1][i];
}
div[k] <-- longdiv[0][k];

component div_range_checks[k + 1];
for (var i = 0; i <= k; i++) {
    div_range_checks[i] = Num2Bits(n);
    div_range_checks[i].in <== div[i];
}

// The new bit length check on the remainder
component mod_range_checks[k];
for (var i = 0; i < k; i++) {
    mod_range_checks[i] = Num2Bits(n);
    mod_range_checks[i].in <== mod[i];
}

References

  1. Commit of the Fix
  2. More info on bigint representation

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits, 4. Mismatching Bit Lengths

Identified By: Veridise Team

The Circom-Pairing circuits, written in circom, are used for the Succinct Labs' bridge that is based on cryptographic protocols. However, the circuits were missing a constraint to ensure proper range checks.

Background

The Circom-Pairing circuit needs to use integers larger than the prime field (254 bits), so it uses the circom big-int library. Therefore, numbers are represented as k-length arrays of n-bit numbers to represent a much larger number. Even though Circom-Pairing uses very large numbers, there is still a max range of expected numbers to be used. To ensure that numbers are constrained to the expected max range, the following circuit is often used:

template BigLessThan(n, k){
  signal input a[k];
  signal input b[k];
  signal output out;
  ...
}

The output of this circuit will be 1 if a < b, and 0 otherwise.

The Vulnerability

The vulnerability arose in the CoreVerifyPubkeyG1 circuit:

template CoreVerifyPubkeyG1(n, k){
  ...
  var q[50] = get_BLS12_381_prime(n, k);

  component lt[10];
  // check all len k input arrays are correctly formatted bigints < q (BigLessThan calls Num2Bits)
  for(var i=0; i<10; i++){
    lt[i] = BigLessThan(n, k);
    for(var idx=0; idx<k; idx++)
      lt[i].b[idx] <== q[idx];
  }
  for(var idx=0; idx<k; idx++){
    lt[0].a[idx] <== pubkey[0][idx];
    lt[1].a[idx] <== pubkey[1][idx];
    ... // Initializing parameters for rest of the inputs
}

The BigLessThan circuit is used to constrain pubkey < q to ensure that the pubkey values are correctly formatted bigints. However, the rest of the circuit never actually checks the output of these BigLessThan circuits. So, even if a proof has pubkey >= q and BigLessThan outputs 0, the proof will successfully be verified. This could cause unexpected behavior as the cryptographic protocol depends on these numbers being within the expected range.

The Fix

The fix required a constraint on all of the outputs of the BigLessThan circuits to ensure that each one had an output of 1. The following snippet was added to fix this:

var r = 0;
for(var i=0; i<10; i++){
    r += lt[i].out;
}
r === 10;

Once this was added, each BigLessThan circuit was then constrained to equal 1. Now, the pubkey inputs can be trusted to be in the expected range.

References

  1. Veridise Explainer Article
  2. Commit of the Fix

Summary

Related Vulnerabilities: 3. Arithmetic Over/Under Flows

Identified By: PSE Security Team

The Semaphore smart contracts performed range checks in some places but not others. The range checks were to ensure that all public inputs were less than the snark scalar field order. However, these checks weren’t enforced in all the necessary places. This could cause new Semaphore group owners to unknowingly create a group that will always fail verification.

Background

Semaphore is a dapp built on Ethereum that allows users to prove their membership of a group and send signals such as votes or endorsements without revealing their original identity. Essentially, trusted coordinators create a group (with a group Id) and add members via the smart contracts. Members can then submit zero knowledge proofs to the coordinator that prove their membership of the group and optionally contain a signal with it.

The Vulnerability

Since the Solidity uint256 type can hold numbers larger than the snark scalar field order, it is important to be wary of overflows. In order to prevent unwanted overflows, the Semaphore verifier smart contract automatically fails if a public input is greater than the snark scalar field order:

// From Semaphore/contracts/base/Verifier.sol (outdated)
require(input[i] < snark_scalar_field, "verifier-gte-snark-scalar-field");

When a coordinator creates a new group, they can input any valid uint256 value as the id. This is a problem since the id is a public input for the zk proof. If the id is greater than the snark scalar field order, the verifier will always revert and the group isn’t operable.

The Fix

To prevent an inoperable group from being created, a check on the group Id is needed. This check needs to ensure that the new group id will be less than the snark scalar field order:

// From Semaphore/contracts/base/SemaphoreGroups.sol
function _createGroup(
    uint256 groupId,
    uint8 depth,
    uint256 zeroValue
  ) internal virtual {
		// The Fix is the following required statement:
    require(groupId < SNARK_SCALAR_FIELD, "SemaphoreGroups: group id must be < SNARK_SCALAR_FIELD");
    require(getDepth(groupId) == 0, "SemaphoreGroups: group already exists");

    groups[groupId].init(depth, zeroValue);
    emit GroupCreated(groupId, depth, zeroValue);
  }

References

  1. Reported Github Issue
  2. Commit of the Fix

Summary

Related Vulnerabilities: 3. Arithmetic Over/Under Flows

Identified By: PSE Security Team

The Zk-Kit smart contracts implement an incremental merkle tree. The intent is for this merkle tree to be involved with zk proofs, so all values must be less than the snark scalar field order in order to prevent overflows.

Background

Semaphore is the first consumer of the Zk-Kit merkle tree implementation. When members sign up via the Semaphore smart contracts, they use an identityCommitment that is stored in the on-chain Zk-Kit merkle tree. The zero knowledge proof will then prove that they have a valid commitment in the tree.

The Vulnerability

When initializing the merkle tree, you must specify a value for the zero leaf:

// From zk-kit/incremental-binary-tree.sol/contracts/IncrementalBinaryTree.sol
// before the fix
function init(
    IncrementalTreeData storage self,
    uint8 depth,
    uint256 zero
  ) public {
    require(depth > 0 && depth <= MAX_DEPTH, "IncrementalBinaryTree: tree depth must be between 1 and 32");

    self.depth = depth;

    for (uint8 i = 0; i < depth; i++) {
      self.zeroes[i] = zero;
      zero = PoseidonT3.poseidon([zero, zero]);
    }

    self.root = zero;
  }

Since the Solidity uint256 allows for numbers greater than the snark scalar field order, a user could unknowingly initialize a merkle tree with the zero leaf value greater than the snark scalar field order. This will also directly cause overflows if the zero leaf is part of a merkle tree inclusion proof needed for a zk proof.

The Fix

During initialization, it must be enforced that the given zero leaf value is less than the snark scalar field order. To enforce this, the following require statement was added to the init function:

// From zk-kit/incremental-binary-tree.sol/contracts/IncrementalBinaryTree.sol
// after the fix
require(zero < SNARK_SCALAR_FIELD, "IncrementalBinaryTree: leaf must be < SNARK_SCALAR_FIELD");

References

  1. Reported Github Issue
  2. Commit of the Fix

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits, 4. Mismatching Bit Lengths

Identified By: Aztec Team

Funds in the Aztec protocol are held in what are called “note commitments”. Once a note commitment is spent, it should not be possible to spend it again. However, due to a missing bit length check, an attacker could spend a single note commitment multiple times.

Background

Whenever a new note commitment is created, it is stored in a merkle tree on-chain. In order to prevent double spending of a single note commitment, a nullifier is posted on-chain after the note is spent. If the nullifier was already present on-chain, then the note cannot be spent.

The Vulnerability

The nullifier generation process should be deterministic so that the same nullifier is generated for the same note commitment every time. However, due to a missing bit length check, the process was not deterministic. The nullifier was generated based on the note commitment index in the merkle tree. The code assumed the index to be a 32 bit number, but there was no constraint enforcing this check.

An attacker could use a number larger than 32 bits for the note index, as long as the first 32 bits matched the correct index. Since they can generate many unique numbers that have the same first 32 bits, a different nullifier will be created for each number. This allows them to spend the same note commitment multiple times.

The Fix

A bit length check was needed on the given note commitment index to enforce that it was at max 32 bits.

References

  1. Aztec Bug Disclosure

Summary

Related Vulnerabilities: Missing Curve Point Checks

Identified By: Nguyen Thoi Minh Quan

The Aztec Plonk verifier, written in C++, accepts proofs containing multiple elements as per the Plonk protocol. However, by manually setting two of the elements to 0, the verifier will automatically accept that proof regardless of the other elements. This allows an attacker to successfully forge a proof.

Background

The full description of this bug is quite math heavy and dives deep into the Plonk protocol. The finder of this bug, Nguyen Thoi Minh Quan, has a great detailed description of the bug here.

Elliptic curves have what is known as a point at infinity. Let O = point at infinity and P be any point on the curve. Then O + P = P. When implementing a cryptographic protocol in code, there are different ways to express the point at infinity. For example, sometimes the number 0 is considered the point at infinity, but other times 0 is considered as the point (0, 0), which is not the point at infinity. This will be important later.

Plonk proofs require a group of elements and curve points, and then will check whether these elements and points satisfy certain equations. One of the main equations to check is an elliptic curve pairing. The curve points that are of importance for this bug are [Wz]1 and [Wzw]1.

The Vulnerability

When [Wz]1 and [Wzw]1 are checked in the verifier code, a value of 0 is recognized as not on the elliptic curve, but the code does not fail immediately. The verifier continues on and later recognizes the 0 value as the point at infinity. This causes the pairing equation to be satisfied, and therefore the proof is successfully verified.

The Fix

The verifier was missing checks at a few different spots in the code. Just one of these checks would stop the 0 bug from working. These checks are explained in more detail in the finder's description. A simple to understand fix would be to agree on a consistent representation of the point at infinity. If 0 was consistently decided as not the point at infinity, then this bug would not work.

This bug is a good example of how implementing a secure cryptographic protocol can become insecure very easily. If one follows the Plonk paper exactly, this bug is not possible. This is a good reminder to test a protocol with inputs that theoretically would never work, as this finder did.

References

  1. Nguyen Thoi Minh Quan's Description

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits

Identified By: botdad

StealthDrop requires users post a nullifier on-chain when they claim an airdrop. If they try to claim the airdrop twice, the second claim will fail. However, ECDSA signatures were used as the nullifier and these signatures are nondeterministic. Therefore different signatures were valid nullifiers for a single claim and users could claim an airdrop multiple times by sending the different valid signatures.

Background

In order to claim an airdrop, users must post a nullifier on-chain. If the nullifier is already present on-chain, the airdrop will fail. The nullifier is supposed to be computed in a deterministic way such that given the same input parameters (the user’s claim in this case), the output nullifier will always be the same. The initial nullifier generation process required users to sign a particular message with their ECDSA private key. The resultant signature is the nullifier that they need to post on-chain when claiming an airdrop.

The Vulnerability

ECDSA signature validation is nondeterministic - a user can use a private key to sign a message in multiple ways that will all pass signature verification. When users create the SNARK to claim an airdrop, they use the nullifier as a public input. The SNARK circuit enforces that the nullifier is a valid signature. Since the users can create multiple valid signatures with the same key and message, they can create multiple proofs with different nullifiers. This allows them to submit these separate proofs and claim an airdrop multiple times.

The Fix

Instead of only constraining signature validation in the SNARK circuit, constraints must also be added on the signature creation process so that the signatures are deterministic. This was originally left out because in order to constrain the signature creation process, the private key is needed as a private input. The StealthDrop team wanted to avoid involving the private key directly. However, due to the vulnerability described, the private key is needed in the circuit to make the signature creation process deterministic.

References

  1. 0xPARC Twitter Thread Explanation

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits, 3. Arithmetic Over/Under Flows

Identified By: Kobi Gurkan

ZkDrops is very similar to the 0xPARC StealthDrop (related bug just above). ZkDrops requires that users post a nullifier on-chain when they claim an airdrop. If they try to claim the airdrop twice, the second claim will fail because the nullifier has already been seen by the smart contract. However, since the EVM allows numbers (256 bits) larger than the snark scalar field order, arithmetic overflows allowed users to submit different nullifiers for the same airdrop claim. This made it possible for a user to claim a single airdrop multiple times.

Background

In order to claim an airdrop, users must post a nullifier on-chain. If the nullifier is already present on-chain, the airdrop will fail. The nullifier is supposed to be computed in a deterministic way such that given the same input parameters (the user’s claim in this case), the output nullifier will always be the same. The nullifier is stored on-chain as a 256 bit unsigned integer.

Since the SNARK scalar field is 254 bits, a nullifier that is > 254 bits will be reduced modulo the SNARK field during the proof generation process. For example, let p = SNARK scalar field order. Then any number x in the proof generation process will be reduced to x % p. So p + 1 will be reduced to 1.

The Vulnerability

The smart contract that checked whether a nullifier has been seen before or not, did not verify whether the nullifier was within the SNARK scalar field. So, if a user has a nullifier x >= p, then they could use both x and x % p as separate nullifiers. These both will be evaluated to x % p within the circuit, so both would generate a successful proof. When the user first claims an airdrop with the x nullifier, x hasn't been seen before so it is successful. Then when the user claims the same airdrop with x % p, that value hasn't been seen by the contract before either, so it is successful as well. The user has now claimed the airdrop twice.

The Fix

The fix to this issue is to add a range check in the smart contract. This range check should ensure that all nullifiers are within the SNARK scalar field so that no duplicate nullifiers satisfy the circuit. The following function to claim an airdrop:

/// @notice verifies the proof, collects the airdrop if valid, and prevents this proof from working again.
function collectAirdrop(bytes calldata proof, bytes32 nullifierHash) public {
	require(!nullifierSpent[nullifierHash], "Airdrop already redeemed");

	uint[] memory pubSignals = new uint[](3);
	pubSignals[0] = uint256(root);
	pubSignals[1] = uint256(nullifierHash);
	pubSignals[2] = uint256(uint160(msg.sender));
	require(verifier.verifyProof(proof, pubSignals), "Proof verification failed");
	nullifierSpent[nullifierHash] = true;
	airdropToken.transfer(msg.sender, amountPerRedemption);
}

Was fixed by adding this range check:

require(uint256(nullifierHash) < SNARK_FIELD ,"Nullifier is not within the field");

References

  1. Github PR

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits

Identified By: PSE Security Team

MACI is a dapp for collusion resistant voting on-chain. Encrypted votes are sent on-chain and a trusted coordinator decrypts the votes off-chain, creates a SNARK proving the results, and then verifies the proof on-chain. The SNARK circuit logic aims to prevent the coordinator from censoring any valid vote - the proof should fail verification if the coordinator attempts to do so. However, a missing logic constraint in the circuit allows the coordinator to shuffle some votes and render targeted votes as invalid. This effectively allows the coordinator to censor any vote they choose.

Background

In order to be able to vote, a user must sign up to the MACI smart contract with a public key. These public keys are stored on-chain in a merkle tree. Users need to know the private key of these public keys in order to vote. When users cast a vote, they encrypt their vote and post it on-chain. Users can override their previous vote by sending a new one. MACI has certain rules for a vote to be considered valid, and the newest valid vote will take precedence over all other votes. The vote includes a stateIndex which is the position of the associated public key in the merkle tree. If the voter doesn’t know the private key of the public key at stateIndex, the vote will be considered invalid. When the coordinator processes all of the votes, they must only count the newest valid vote for each user. The SNARK circuit logic is designed to constrain the coordinator to doing exactly that. Therefore, if the circuit works as intended, the coordinator is not able to create a proof of voting results that include invalid votes or exclude valid votes. It is censorship resistant.

The circuit ProcessMessages.circom takes as input an array of user public keys -currentStateLeaves[] - along with the votes to process - msgs[]. The vote actions in the msgs array are processed on the public keys in the currentStateLeaves array:

// The state leaves upon which messages are applied.
// Pseudocode
transform(currentStateLeaf[4], msgs[4]) ==> newStateLeaf4
transform(currentStateLeaf[3], msgs[3]) ==> newStateLeaf3
transform(currentStateLeaf[2], msgs[2]) ==> newStateLeaf2
transform(currentStateLeaf[1], msgs[1]) ==> newStateLeaf1
transform(currentStateLeaf[0], msgs[0]) ==> newStateLeaf0

In MACI, a valid vote message can be used to change a user’s public key. From that point on, user’s can only use the new public key to update/create votes. The transform function will output the updated state leaf with the new public key if it was changed.

The Vulnerability

The currentStateLeaf array is ordered by the coordinator. Therefore, the coordinator effectively has control over which public key a new vote message is applied to. Therefore, they can choose to apply a vote message from one user, to another user’s public key. This will cause that vote message to be considered invalid since it doesn’t match the public key.

There is a constraint ensuring that a message’s stateIndex matches the public key it was applied on, but this constraint only marks a vote message as invalid if so. Before the stateIndex is checked, the circuit checks other cases where the message may be invalid, such as if the public key matches this new vote message. If it gets marked as invalid, the stateIndex check won’t make any changes. This circuit is under-constrained.

If a malicious coordinator wants to censor msgs[3] (the 4th voting message), then they will set currentStateLeaf[3] to the 0 leaf. Then, msgs[3] will be marked as invalid since the public key doesn’t match. The check that currentStateLeaf[3].index === msgs[3].stateIndex is avoided since the message is already invalid. The coordinator has effectively censored msgs[3].

The Fix

The main issue that needs to be fixed is constraining voting messages to the intended public keys. The circuit is quite complicated so other changes needed to be made, but the most significant change was adding the following check before marking a vote message as invalid:

// Pseudo code
currentStateLeaf[i].index <== msgs[i].stateIndex

This check ensures that the vote message is applied to the intended public key (state leaf) before marking any message as invalid. Therefore, a coordinator can no longer mismatch vote messages to unintended public keys and purposefully mark them as invalid.

References

  1. Issue on Github

Summary

Related Vulnerabilities: 6. Frozen Heart

Identified By: TrailOfBits Team

The bulletproof paper, which outlines the bulletproof zero knowledge proof protocol, outlines how to use the Fiat-Shamir transformation to make the proof non-interactive. However, their recommended implementation of the Fiat-Shamir transformation left out a crucial component. This missing component in the non-interactive version of the protocol allowed malicious provers to forge proofs.

Background

Many zero knowledge proof protocols are first designed in an interactive way where the prover and verifier must communicate with each other for multiple rounds in order for the proof to be created and subsequently verified. This often takes the form of:

  1. The prover creates a random value known as the commitment
  2. The verifier replies with a random value known as the challenge
  3. The prover uses the commitment, challenge, and their secret data to create a proof

For the proof to be secure, the verifier’s challenge must be entirely unpredictable and uncontrollable by the prover.

The Fiat-Shamir transformation allows the zero-knowledge proof protocol to become non-interactive by having the prover compute the challenge instead of the verifier. The prover should have no control in the challenge’s value, so the prover must use a hash of all public values, including the commitments. This way the prover cannot easily manipulate the proof to be accepted for invalid inputs.

Bulletproofs use Pedersen commitments, which are of the form:

commitment = (g^v)(h^gamma)

Here g and h are elliptic curve points and v is a secret number. The bulletproof is meant to prove that v falls within a certain range. Since this commitment is public, it should be included in the Fiat-Shamir transformation used in the protocol.

The Vulnerability

The bulletproof paper provided insecure details on how to implement the Fiat-Shamir transformation for the protocol. Their implementation did not include the Pedersen commitment in the Fiat-Shamir transformation. This means that the challenge value is independent of the Pedersen commitment and so the prover can keep trying random values for the commitment until they get a proof that succeeds for v outside of the desired range. For more details on how exactly this allows a malicious prover to forge a proof, please see the TrailOfBits’ explanation in the references section.

The Fix

In order to prevent this Frozen Heart vulnerability, the Pedersen commitment should be added to the Fiat-Shamir transformation hash. This will make the challenge directly dependent on the commitment and restrict the prover’s freedom when making the proof. The new restriction is enough to prevent the prover from forging proofs.

References

  1. TrailOfBits Explanation

Summary

Related Vulnerabilities: 6. Frozen Heart

Identified By: TrailOfBits Team

Please see 8. Bulletproofs: Frozen Heart for a more in-depth background.

Due to some ambiguity in the PlonK paper regarding Fiat-Shamir transformations, a few production implementations did not handle it correctly. The Fiat-Shamir transformation requires all public inputs to be included in the hash. However, some implementations did not properly include all of these public inputs. This allowed malicious provers to forge proofs to different extents, depending on the rest of the implementation details.

Background

The PlonK protocol is originally described as an interactive proof protocol between the prover and verifier. However, the paper provides some details on how to make it a non-interactive proof protocol. The paper outlines details on implementing the Fiat-Shamir transformation to make the protocol non-interactive, but it still lacks clarity on what inputs exactly are needed to make it secure.

The Vulnerability

Multiple implementations had a frozen heart vulnerability due to their implementation of the Fiat-Shamir transformation. Each implementation had missing public inputs to the hash for the Fiat-Shamir transformation, and thus gave more freedom to the prover when constructing proofs. With this increased freedom, a malicious prover could then forge a proof. The extent of the forgery depends on the rest of the implementation details and so it varied for the different implementations affected by this vulnerability. For more exact details about this vulnerability, please see the TrailOfBits’ explanation in the references section.

The Fix

The fix for these vulnerabilities differs for each implementation affected, but it generally includes a fix to the Fiat-Shamir transformation. The fix involves ensuring that all public inputs are included in the hash so that the prover does not have the freedom needed to forge a proof.

References

  1. TrailOfBits Explanation

Summary

Related Vulnerabilities: 7. Trusted Setup Leak

Identified By: Zcash Team

The Zcash zero-knowledge proofs are based on a modified version of the Pinocchio protocol. This protocol relies on a trusted setup of parameters based on the Zcash circuit. However, some of the parameters generated as part of this trusted setup allow a malicious prover to forge a proof that creates new Zcash tokens.

Background

For zero-knowledge protocols such as Pinocchio and Groth16, a trusted setup is required. The trusted setup is a way to generate the proper parameters needed so that the prover can generate a proof, and the verifier can properly verify it. The trusted setup process usually involves parameters that must be kept secret, otherwise malicious provers can forge proofs. These parameters are known as “toxic waste” and kept private through the use of multi-party computation. Therefore, it’s very important the trusted set-up process generates the correct parameters and keeps the toxic waste hidden.

The Vulnerability

The trusted setup for Zcash’s implementation of Pinocchio generated extra parameters that leaked information about the toxic waste. These extra parameters allowed malicious provers to forge proofs that would create Zcash tokens out of nothing. The use of the extra parameters essentially allowed users to counterfeit tokens. However, this vulnerability was never exploited.

The Fix

Since the toxic parameters were visible on the trusted setup ceremony document, it was impossible to ensure no one had seen them. The issue was fixed by the Zcash Sapling upgrade which involved a new trusted setup. The new trusted setup was ensured to not release any toxic parameters. Please see the Zcash explanation in the references section for more details on the timeline of events.

References

  1. Zcash Explanation
  2. Pinocchio Protocol
  3. Zcash’s Modified Pinocchio Protocol

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits, 8. Assigned but not Constrained

Identified By: Kobi Gurkan

The MiMC hash circuit from the circomlib package had a missing constraint during its computation logic. The circuit was under-constrained and nondeterministic because it didn't properly constrain MiMC inputs to the correct hash output.

Background

In circom constraints are created by the following three operators: <==, ===, and ==>. If other operators are used such as <--, =, or -->, then a constraint will not be added to the R1CS file. These other operators assign numbers or expressions to variables, but do not constrain them. Proper constraints are what is needed for the circuit to be sound.

The Vulnerability

During a computation step for this circuit, the = operator was used instead of the <== operator that was needed to create a constraint. Here is the code before the fix:

outs[0] = S[nInputs - 1].xL_out;

The = operator assigned S[nInputs - 1].xL_out to outs[0], but did not actually constrain it. An attacker could then manipulate outs[0] when creating their own proof to manipulate the final output MiMC hash. Essentially, an attacker can change the MiMC hash output for a given set of inputs.

Since this hash function was used in the TornadoCash circuits, this would allow the attacker to fake a merkle root and withdraw someone else's ETH from the contract.

The Fix

The fix was simply to change = to a constraint operator <==.

outs[0] <== S[nInputs - 1].xL_out;

References

  1. TornadoCash Explanation
  2. Actual Github Fix

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits, 8. Assigned but not Constrained

Identified By: PSE Security Team

The PSE & Scroll zkEVM modulo circuit was missing a constraint, which would allow a malicious prover to create a valid proof of a false modulo operation. Since the modulo operation is a basic building block for the zkEVM, the prover could convince the verifier of a wrong state update.

Background

The PSE & Scroll zkEVM circuits are programmed using their own fork of Zcash's Halo2. Small components of the large zkEVM circuit can be broken down into what are called gadgets. In this case, the modulo gadget was missing a constraint.

The Modulo gadget is intended to constrain:

a mod n = r, if n!=0
r = 0,       if n==0

The prover must supply a, n, r, and k, where they are all less than 2^256. The Modulo gadget uses the MulAddWords gadget which constrains:

a * b + c == d (modulo 2**256)

And the prover must supply a, b, c, and d. So the Modulo gadget inputs k, n, r, a for a, b, c, d. This creates the following constraint:

k * n + r == a (modulo 2**256)

This constraint is intended to prove that r = a mod n. The assignment in the assign function calculates this correctly, but the constraints do not enforce it properly.

The Vulnerability

The vulnerability arises from the fact that the MulAddWords gadget is done modulo 2^256 and that k * n + r can also be greater than 2^256. This is because even though k, n, r are all less than 2^256, their multiplication and sum can be greater than that. Since the prover can manipulate k freely for a given n, r and a, the prover can use k to overflow the sum and get a successful modulo operation.

For example, let:

n = 3
k = 2^255
r = 0
a = 2^255

The statement 0 = 2^255mod3 is false. But this statement will prove successfully in the circuit. This is because this is the actual constraint that is checked (which is true in this case):

3 * 2^255 + 0 = 2^255 (mod 2^256).

Since the prover can prove these false modulo operations, they can convince the verifier of incorrect state updates that rely on these operations. The modulo operation is a basic building block of the zkEVM, so there are many possible incorrect state updates that a prover can make that will successfully be verified.

The Fix

The fix for this issue is to add another constraint forcing k * n + r to be less than 2^256 so that no overflows happen. This is enough to avoid the overflow and accurately prove that r = a mod n for any r, a, and n less than 2^256.

References

  1. Github Issue
  2. Commit of the Fix

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits, 8. Assigned but not Constrained

Identified By: PSE Security Team

The PSE & Scroll zkEVM SHL/SHR opcode circuit was missing a constraint, which would allow a malicious prover to create a valid proof of a false shift operation. Since the SHL/SHR opcode is a basic building block for the zkEVM, the prover could convince the verifier of a wrong state update.

Background

The SHL/SHR opcode (bit shift left and bit shift right) takes in two inputs from the stack: x and shift. For SHL it should output x << shift and for SHR it should output x >> shift. Since x and shift are on the stack, they each can be any 256 bit value. The calculation of a shift operation involves calculating 2^shift. Since shift can be a very large number, this calculation in a circuit could become very expensive. A to avoid this is recognizing that whenever shift > 255, the output to the stack should be 0 both for SHL or SHR. Then make the circuit compute 2^shift only when shift <= 255. This is what the zkEVM SHL/SHR opcode circuit does. Also note that this circuit is shared between both opcodes.

The Vulnerability

The opcode circuits take in shf0 and shift as two separate variables, where shf0 is meant to be the first byte of the shift variable. Then, if shift <= 255, the circuit calculates 2^shf0. The assign_exec_step function properly assigns these two variables:

let shf0 = pop1.to_le_bytes()[0];
...
self.shift.assign(region, offset, Some(pop1.to_le_bytes()))?;
self.shf0.assign(region, offset, Value::known(u64::from(shf0).into()))?;

However, the configure function, where constraints are created for this opcode, does not properly constrain shf0 to be the first byte of shift. This allows a malicious prover to fork this code and change the assign_exec_step function to assign whatever they want to shf0. This would allow them to successfully prove 2 << 1 outputs 8 if they assign shf0 = 2 when it should actually be constrained to output 4.

The Fix

The fix was to add the constraint forcing shf0 to be the first byte of shift. This was done with the following code addition:

instruction.constrain_zero(shf0 - FQ(shift.le_bytes[0]))

References

  1. Github Issue
  2. The Fix

Summary

Related Vulnerabilities: Incomplete Protocol Implementation

Identified By: Dusk Network Team

The Dusk Network is a privacy-oriented blockchain that relies on zk proofs. In order to achieve certain privacy features, the zk proofs need blinding factors for each proof created. The original Dusk implementation of Plonk was missing some of these blinding factors.

Background

ZK SNARKs are useful for both their succinctness and their zero knowledge. The main pieces of the Plonk protocol allows the proofs to be succinct, and it only takes a few small steps to make the protocol zero knowledge as well. Making the protocol zero knowledge means that an attacker cannot look at a proof and then derive the witness used to generate that proof.

In Plonk one of the few steps that makes the protocol zero knowledge is adding blinding factors to the prover polynomials. Essentially, the prover shifts the polynomials by a secret amount while still keeping the proof verficiation successful. These secret shifts prevent others from extracting the witness from the proof.

The Vulnerability

Dusk's original Plonk implementation was missing some of these blinding factors. Since Dusk is a privacy-oriented blockchain, many of the inputs to the zk proof need to remain private. However, without blinding factors anyone could potentially extract these "private inputs" from the proof data.

The Fix

The fix was to simply add blinding factors to the prover polynomials so that the proof keeps the witness private. The Plonk paper doesn't include much writing on these blinding factors, but still includes them in the final protocol at the end. This is likely because it's quite simple (compared to the rest of the protocol) to include them.

References

  1. Github Issue
  2. Github Fix
  3. Plonk Paper - Section 8, first bullet point explains the blinding factors
  4. zkSNARKs in a Nutshell - Section 4.3 explains blinding factors but for R1CS snarks

Summary

Related Vulnerabilities: 3. Arithmetic Over/Under flows

Identified By: BlockHeader

EY Nightfall is a set of smart contracts and ZK circuits that allow users to transact ERC20 and ERC-721 tokens privately. The protocol requires that a nullifier be posted on-chain in order to spend private tokens. However, the protocol did not limit the range of the nullifier to the SNARK scalar field size. This allowed users to double spend tokens.

Background

In order to prevent double spending of private tokens, a nullifier is posted on-chain after the tokens are spent. If the nullifier was already present on-chain, then the tokens cannot be spent. The nullifier is computed in a deterministic way such that given the same input parameters (specific to the user’s private tokens in this case), the output nullifier will always be the same. The nullifier is stored on-chain as a 256 bit unsigned integer.

The EVM allows numbers up to 256 bits long, whereas the SNARK circuits used for Nightfall only allowed numbers up to around 254 bits long. Since the SNARK scalar field is 254 bits, a nullifier that is > 254 bits will be reduced modulo the SNARK field during the proof generation process. For example, let p = SNARK scalar field order. Then any number x in the proof generation process will be reduced to x % p. So p + 1 will be reduced to 1.

The Vulnerability

The smart contract code that stores past used nullifiers did not check to ensure that the nullifier posted was within the SNARK scalar field (< ~254 bits). Since the circuit code is responsible for checking whether a given nullifier is correct or not for the tokens being spent, it will only check the reduced 254 bit version of the input nullifier.

For example, let's say a user wants to spend their tokens and the correct nullifier to do so is n. Since the correct nullifier is computed in the circuit code, n will be < ~254 bits. So the user can successfully spend the tokens by posting n on-chain as the nullifier. However, they can again post n + p on-chain, where p = snark scalar field size. Inside the circuit that checks whether n + p is correct, it will convert n + p to n + p % p = n. n + p essentially overflows to just n. So the circuit checks n and is therefore verified as the correct nullifier. On-chain, n + p and n are treated as two different nullifiers and don't overflow (unless n + p > 256 bits), so the nullifiers are stored separately and the tokens are spent a second time by the same user.

The Fix

The fix was to include a range check to ensure that any nullifiers posted on-chain were less than the SNARK scalar field size. This would prevent any overflows inside the circuit. Each token spend now only has one unique nullifier that can be posted on-chain successfully. Here is a snippet of the actual fix, where they ensure the nullifier is correctly range limited:

//checks to prevent a ZoKrates overflow attack
require(_inputs[3]<zokratesPrime, "Input too large - possible overflow attack");
require(_inputs[4]<zokratesPrime, "Input too large - possible overflow attack");

References

  1. Github Issue
  2. Github Fix

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 8. Assigned but not Constrained

Identified By: Summa

The circuit, written in Halo2, makes use of an external LessThan gadget that returns 1 if the lhs is less than the rhs. The logic of the circuit constrains the output to be equal to 1 using an auxiliary value check. However, the check value is assigned during witness generation. This allows a malicious prover to set check to any value, including 0, which would cause the circuit to generate a valid proof for a lhs that is greater than the rhs.

Background

Summa is a zk proof of solvency protocol. As part of the (simplified) logic of the protocol, the prover needs to prove that their assets_sum is greater than the liabilities_sum. The circuit would use a gadget from zk-evm to perform such less_than verification. The gadget would return 1 if the lhs is less than the rhs, and 0 otherwise.

The Vulnerability

The circuit would have a custom gate to enforce that check is equal to lt which is the output of the comparison performed by the gadget

    meta.create_gate(
        "verifies that `check` from current config equal to is_lt from LtChip",
        |meta| {
            let q_enable = meta.query_selector(lt_selector);

            let check = meta.query_advice(col_c, Rotation::cur());

            vec![q_enable * (config.lt_config.is_lt(meta, None) - check)]
        },
    );

Later on the circuit would have an assignment function to be called during witness generation to assign the value 1 to check

    // set check to be equal to 1
    region.assign_advice(
        || "check",
        self.config.advice[2],
        0,
        || Value::known(Fp::from(1)),
    )?;

However, this design erroneusly suppose that any prover would be using the assignment function provided by the library. A malicious prover can simply take the function and modify it to assign a different Value::known to check, even 0. This would cause the circuit to generate a valid proof for a lhs that is greater than the rhs.

The Fix

To fix the issue, the custom gate has been modified to take a constant expression (and set it to 1!) rather than a value obtained from the advice column (which is the one that can be modified by the prover during witness generation).

    meta.create_gate("is_lt is 1", |meta| {
        let q_enable = meta.query_selector(lt_selector);

        vec![
            q_enable * (config.lt_config.is_lt(meta, None) - Expression::Constant(Fp::from(1))),
        ]
    });

References:

  1. Issue
  2. PR

Summary

Related Vulnerabilities: 1. Under-constrained Circuits, 2. Nondeterministic Circuits

Identified By: Spearbit

The Polygon zkEVM division circuit was missing a constraint, which would allow a malicious prover to create a valid proof of a false division or modulo operation. Since the division and modulo operation are basic building blocks for the zkEVM, the prover could convince the verifier of a faulty state update.

Background

The Polygon zkEVM circuits are programmed using their own zk assembly language known as zkASM. Small components of the large zkEVM circuit can be broken down into different subroutines. In this case, the divARITH subroutine was missing a constraint.

The division subroutine is intended to constrain:

A * B + C = D + E

where:

  • A = divisor
  • E = dividend
  • B = quotient
  • C = remainder
  • D = set to 0 by subroutine

The inputs B and C are free inputs chosen by the prover. B is supposed to be constrained to E / A and C is supposed to be constrained to C = E % A. This subroutine provides a way to prove the division or modulo operation.

The Vulnerability

The issue with this subroutine is that there is no constraint that the remainder is less than the divisor. So in this case, without the constraint, it's possible for C >= A. A malicious prover could set B = E / A - 1 and C = (E % A) + A. This would still satisfy the equation A * B + C = D*2**256 + E.

A good example (taken from the twitter explanation) is to let:

  • A (divisor) = 10
  • E (dividend) = 101

The expected remainder (C) is 1 and the expected quotient (B) is 10. However, a malicious prover could choose B = 9 and C = 11. This would satisfy 10*9 + 11 = 101. The code never constrains C < A or 11 < 10 so the proof is successful.

This kind of forgery allows a malicious prover to tweak division and modulo operations in their favor. Since division and modulo operations are basic building blocks of the EVM, this could result in many different types of hacks favorable to the prover.

The Fix

The original authors had a constraint that C < E but instead needed C < A. So the original code:

C => A ; remainder
E => B ; divisor

was updated to:

A => B ; divisor
C => A ; remainder

References

  1. Spearbit Twitter Explanation Thread

Summary

Related Vulnerabilities: 1. Under-constrained Circuits

Identified By: Hexens

Background

The Storage state machine uses SMT (Sparse Merkle Tree) and implements provable CRUD operations in conjunction with Storage ROM.

In order to prove the (Key, Value) inclusion in the SMT, the state machine represents the key as bit string. It traverses the tree from the root down to the leaf using LSBs (least significant bits) of the key: 0/1 values of the bit correspond to the left/right edge traversal.

As the tree is sparse, the leaf level is not necessarily equal to the key bits length, that means that as soon as the leaf is inserted into the tree, the remaining part of the key (rkey) is being encoded into the leaf value.

The inclusion check algorithm consists of two parts (reference link: https://wiki.polygon.technology/docs/zkEVM/zkProver/basic-smt-ops):

  1. Checking The Root - is done as a generic Merkel Tree root check by climbing from the leaf to the root using sibling hashes.

  2. Checking the Key: I. In order to check the key, the state machine prepends every next path bit to the remaining key (rkey), e.g. rkey||b1||b0 (for the leaf level = 2). II. By the end of the operation, which can be distinguished in Storage ROM by the LATCH_GET command, the iLatchGet is set to 1. III. A Permutation constraint is used in the Main SM to ensure that the key passed from the zkEVM ROM matches the one constructed in the step (I):

    sRD {
        SR0 + 2^32*SR1, SR2 + 2^32*SR3, SR4 + 2^32*SR5, SR6 + 2^32*SR7,
        sKey[0], sKey[1], sKey[2], sKey[3],
        op0, op1, op2, op3,
        op4, op5, op6, op7,
        incCounter
    } is
    Storage.iLatchGet {
        Storage.oldRoot0, Storage.oldRoot1, Storage.oldRoot2, Storage.oldRoot3,
        Storage.rkey0, Storage.rkey1, Storage.rkey2, Storage.rkey3,
        Storage.valueLow0, Storage.valueLow1, Storage.valueLow2, Storage.valueLow3,
        Storage.valueHigh0, Storage.valueHigh1, Storage.valueHigh2, Storage.valueHigh3,
        Storage.incCounter + 2
    };

Hence, the part (1) is used to prove that the Value exists in the SMT, and the part (2) is used to prove that the Value is actually binded with the correct Key.

The Vulnerability

The issue arise as the next-bit polynomial rkeyBit is missing a binary constraint in the Storage SM, as well as the fact that the Storage ROM, pretty naturally, does not assert that the next bit, that comes from a free input, cannot be of any other value than 0 or 1

storage_sm_get.zkasm:

; If next key bit is zero, then the sibling hash must be at the right (sibling's key bit is 1)
${GetNextKeyBit()} => RKEY_BIT
RKEY_BIT                        :JMPZ(Get_SiblingIsRight)

Nonetheless, it can not be abused straightforwardly; a number of limitations must be overcome. Due to the specifics of the key reconstruction algorithm and the fact that the value inclusion check in the part (1) needs to hold true simultaneously.

Limitations overview:

In the Storage SM, The key is broken down into four registers: rkey0,..,rkey3 and the path is constructed by cycling the consecutive bits of that registers: path = [rKey0_0, rKey1_0, rKey2_0, rKey3_0, rKey0_1, ... ] (reference link: https://wiki.polygon.technology/docs/zkEVM/zkProver/construct-key-path)

Thus, in order to reconstruct the key from the bits, the corresponding rkey polynomial needs to be prepended with that bit:

rkey[level % 4] ||= rkeyBit

It is important to mention that the key is actually the POSEIDON hash of the account address, storage slot and the query key.

In order to avoid modulo 4 operation, the Storage SM introduces LEVEL register, which also consists of 4 parts: level0,..level3 and the ROTATE_LEVEL opcode in the Storage ROM.

The LEVEL is firstly set to leaf_level % 4, and then ROTATE_LEVEL is used every time the prover needs to climb the tree:

storage-sm-get.zkasm:

; Update remaining key
                                :ROTATE_LEVEL
                                :CLIMB_RKEY

                                :JMP(Get_ClimbTree)

Level rotation(storage.pil):

pol rotatedLevel0 = iRotateLevel*(level1-level0) + level0;
pol rotatedLevel1 = iRotateLevel*(level2-level1) + level1;
pol rotatedLevel2 = iRotateLevel*(level3-level2) + level2;
pol rotatedLevel3 = iRotateLevel*(level0-level3) + level3;

Finally, the rkey will be modified when using the CLIMB_RKEY opcode

storage.pil:

pol climbedKey0 = (level0*(rkey0*2 + rkeyBit - rkey0) + rkey0);
pol climbedKey1 = (level1*(rkey1*2 + rkeyBit - rkey1) + rkey1);
pol climbedKey2 = (level2*(rkey2*2 + rkeyBit - rkey2) + rkey2);
pol climbedKey3 = (level3*(rkey3*2 + rkeyBit - rkey3) + rkey3);

In order to hold the above-mentioned permutation constraint all of the rkey0-3 registers must be modified by the end of the operation (when the LATCH_GET opcode is used). As well as the Storage ROM will be rearranging the left/right node hashes by matching the next-bit. Given the fact that one can only abuse the non-zero values of the next-bit, the limitations can be overcome by inserting arbitrary Value into the SMT with a Key that has 1111 as its LSBs.

Key = ***1111, this is needed to have the opportunity to change all 4 rkey registers. This means that the POSEIDON hash that is derived from the account address and the storage slot number (in addition to the storage query key), needs to have the least significant bits of its result registers hash0,..hash3 set as 1:

hash0 = ***1
hash1 = ***1
hash2 = ***1
hash3 = ***1

As only 1 bit of every POSEIDON hash register is fixed, it is a trivial task to overcome the 4-bit entropy and find a storage slot (for any given account address) to meet the attack prerequisites. Another limitation is that the leaf we are inserting must have a level greater than 4, in the real-world scenario this is guaranteed to be the case (with a negligible negative outcome probability) as there will be millions of leafs inserted into the tree. Even if it's not the case the attacker will only need to precompute two storage slots and insert them both to guarantee the minimal level.

After inserting (KeyPrecomputed, ValueArbitrary) into the SMT using the opSSTORE procedure, and thus fulfilling the prerequisites the attacker can fake the binding of any key KeyToFake with the value ValueArbitrary, by setting the last 4 next-bit values from the free input to:

rkeyBit[0] =  rkeyToFake[0] - rkey0*2
rkeyBit[1] =  rkeyToFake[1] - rkey1*2
rkeyBit[2] =  rkeyToFake[2] - rkey2*2
rkeyBit[3] =  rkeyToFake[3] - rkey3*2

As the Storage ROM uses JMPZ to distinguish the climb path, despite being greater than 1 the rkeyBit will be treated the same way as if it was set to 1, and the root check (Value inclusion) will successfully be bypassed.

The main impact that will favor the attacker will be to fake the inclusion of (KeyAttackerBalance, ArbitraryAmount) in the SMT.

The Fix

The Fix of this issue is to add the missing binary constraint.

References

  1. Hexens Audit Report
  2. Fix Commit

Summary

Identified By: Hexens

Background

The zkEVM ROM architecture uses Contexts (CTX) to divide and emulate virtual address to physical address translation between call contexts inside one transaction. One CTX address space is used to determine the dynamic memory space that changes between call contexts, as well as the stack and CTX variables (such as msg.sender, msg.value, active storage account and etc.). The context switch is done using auxiliary variables such as originCTX, which refers to the origin CTX that created the current context as well as currentCTX. There is a special CTX(0) that is used for storing the GLOBAL variables such as tx.origin or old state root, the first context a batch transaction starts with is CTX(1), and it increments as new calls, context switches, or transactions are being processed.

The Vulnerability

The vulnerability lies in the "identity" (0x4) precompiled contract implementation. In case there is no originCTX set, as that effectively means that the EOA is directly calling the precompiled contract and not within an inner contract call, the precompiled contracts should consume intrinsic gas and end transaction execution. Although the context switching is done correctly in the ecrecover (0x1) precompile, the identity precompile is erroneous in its context switching. To check that the transaction is calling the contract directly, is utilizes originCTX variable and checks whether it is equal to 0:

https://github.com/0xPolygonHermez/zkevm-rom/blob/develop/main/precompiled/identity.zkasm#L21

$ => CTX        :MLOAD(originCTX), JMPZ(handleGas)

Although it immediately loads the originCTX into the CTX register, all of the memory operations will be done for the CTX(0).

As the context switch between GLOBAL and CTX contexts is done via useCTX: https://github.com/0xPolygonHermez/zkevm-proverjs/blob/develop/pil/main.pil#LL203-L204C85

pol addrRel = ind*E0 + indRR*RR + offset;
pol addr = useCTX*CTX*2^18 + isStack*2^16 + isStack*SP + isMem*2^17+ addrRel;

GLOBAL -> useCTX = 0, CTX -> useCTX = 1

Effectively the final address will be the same if the CTX register is set to 0. Given that the variables are addressed by their offset, the ROM's global variables will be double-referenced by their appropriate CTX variables with the same offset.

Example:

OFFSET(0): VAR GLOBAL oldStateRoot  <--> VAR CTX txGasLimit
OFFSET(1): VAR GLOBAL oldAccInputHash <--> VAR CTX txDestAddr
...
OFFSET(17): VAR GLOBAL nextHashPId <--> VAR CTX gasRefund

Thus colliding the GLOBAL and CTX variable offsets.

The attack breakdown:

  • Any user (EOA) creates a transaction with a destination address set to identity precompiled contract (0x4)
  • When the execution reaches $ => CTX :MLOAD(originCTX), JMPZ(handleGas), the CTX will be set to 0, and a jump will be done to the handleGas label
  • handleGas will check the refund (an important detail is that in the current VAR configuration, the gasRefund variable collides with nextHashPId, which will be 0 in this case, although if it were to collide with another VAR that has bigger absolute values in it than the caller will "print money out of thin air" for himself as well), after refunding the sender it continues to a point where it needs to account the gas consumed to the sequencer address
;; Send gas spent to sequencer
sendGasSeq:
        $ => A          :MLOAD(txGasLimit)
        A - GAS => A

        $ => B          :MLOAD(txGasPrice)
        ; Mul operation with Arith
        A               :MSTORE(arithA)
        B               :MSTORE(arithB), CALL(mulARITH)
        $ => D          :MLOAD(arithRes1) ; value to pay the sequencer in D

As the txGasLimit references the oldStateRoot, which is the hash of the state tree and has a very big absolute value, the MLOAD(txGasLimit) will return the oldStateRoot value instead. By setting a gasPrice to 1 (or an arbitrarily small value not to overflow the multiplication), the sequencer will be credited with an enormously big balance

The attack requirements and probability: For any user to be able to credit himself an almost infinite ether balance, he needs to be the one sequencing it. The most convenient way to do so is to force a batch in L1 PolygonZkEVM contract. As in the current configuration, the trusted sequencer ignores the forced batches; it stores them in separate state.forced_batch table in the DB: https://github.com/0xPolygonHermez/zkevm-node/blob/develop/state/pgstatestorage.go#L316-L320

And when the sequencer will query for the pending batches to be sequenced in the getSequencesToSend() function:

https://github.com/0xPolygonHermez/zkevm-node/blob/develop/sequencer/sequencesender.go#L114

It will only query for the batches from state.batch table: https://github.com/0xPolygonHermez/zkevm-node/blob/develop/state/pgstatestorage.go#L535-L539

Thus the attacker will need to force a batch and then to wait for the timeout period to pass and sequence it, setting the sequencer to arbitrary address. In the current configuration the attack gives opportunity to anyone to force such batch and after the timeout period to be credited with unlimited ether balance, if combined with obfuscating the transaction with other "dummy" transactions and adding a bridgeAsset() call somewhere after in the same batch, the attacker will gain a deposit leaf of arbitrary ether amount as soon as the batch is verified and can claim all of the ether held by the bridge.

The Fix

The Fix of this issue is to change the identity.zkasm code to save the originCTX in a register before jumping to the handleGas label.

References

  1. Hexens Audit Report
  2. Fix Commit

Summary

Related Vulnerabilities: 1. Under-constrained Circuits

Identified By: Hexens

Background

The combination of the free input checking in zkEVM ROM and a missing constraint in main.pil leads to execution hijack with a possibility to jump to an arbitrary address in ROM.

The Vulnerability

One of the impacts is the arbitrary increase of balance for any caller.

In the file utils.zkasm some of the procedures use free input calls to make small calculations, for example,

computeSendGasCall:

; C = [c7, c6, ..., c0]
; JMPN instruction assures c0 is within the range [0, 2^32 - 1]
${GAS >> 6} => C        :JMPN(failAssert)
${GAS & 0x3f} => D

; since D is assured to be less than 0x40
; it is enforced that [c7, c6, ..., c1] are 0 since there is no value multiplied by 64
; that equals the field
; Since e0 is assured to be less than 32 bits, c0 * 64 + d0 could not overflow the field
C * 64 + D              :ASSERT

In such cases, to ensure the validness of the free input JMPN is used. JMPN will effectively check whether the free input set in register C is in the range [0,2^32-1]. This is a security assumption that ensures that the register is not overflowing in the assertion stage:

C * 64 + D              :ASSERT

The JMPN constraints: https://github.com/0xPolygonHermez/zkevm-proverjs/blob/develop/pil/main.pil#L209-L228

pol jmpnCondValue = JMPN*(isNeg*2^32 + op0);

By checking that the jmpnCondValue is a 32-bit number we assure that the op0 is in the [-2^32,2^32) range, thus preventing the overflow. The jump destination, as well as the zkPC constraints, are consequently based on the inNeg: https://github.com/0xPolygonHermez/zkevm-proverjs/blob/develop/pil/main.pil#L322-L336

zkPC' = doJMP * (finalJmpAddr - nextNoJmpZkPC) + elseJMP * (finalElseAddr - nextNoJmpZkPC) + nextNoJmpZkPC;

Nonetheless, a constraint is missing to ensure that isNeg evaluates only to 1 or 0. In the case of utils.zkasm procedures, there is no elseAddr specified, and having:

finalElseAddr = nextNoJmpZkPC
doJMP = isNeg
elseJMP = (1-isNeg)

The zkPC constraint can be reduced to:

zkPC' = isNeg * (finalJmpAddr - nextNoJmpZkPC) + nextNoJmpZkPC

Where both finalJmpAddr and nextNoJmpZkPC are known values in the ROM program compilation phase.

In order to be able to jump to arbitrary zkPC, the attacker needs to calculate corresponding values for isNeg and op0; this can be done using derived formulas:

isNeg = (zkPC_arbitrary - nextNoJmpZkPC) * (finalJmpAddr - nextNoJmpZkPC)-1 mod P
op0 = - isNeg * 232 mod P

The attack breakdown:

At this point attacker has a primitive to jump to an arbitrary address; the next step will be to find a suitable gadget to jump to, the main requirements for the target are to:

  • Not to corrupt/revert zkEVM execution
  • Impact favourably for the attacker

One of the jump chains found is to use one of the *CALL opcodes as the start of the attack chain to call the computeSendGasCall and subsequently craft a jump into the refundGas label's code: https://github.com/0xPolygonHermez/zkevm-rom/blob/develop/main/process-tx.zkasm#L496-L498

$ => A                          :MLOAD(txSrcOriginAddr)
0 => B,C                        ; balance key smt
$ => SR                         :SSTORE

This will set txSrcOriginAddr balance to the value contained in register D and finish the transaction execution. To abuse the value set by the SSTORE instruction, attacker needs to set huge value in the register D, for this the DELEGATECALL opcode can be used, as in the implementation it sets the register D just before the computeSendGasCall call:

$ => D          :MLOAD(storageAddr)
...
E               :MSTORE(retCallLength), CALL(computeGasSendCall); in: [gasCall: gas sent to call] out: [A: min( requested_gas , all_but_one_64th(63/64))]

So the D register will be set with storageAddr which has very big absolute value.

Additional setup for the attack:

  • A contract should be deployed with a function (or fallback) that initiates a delegatecall() call to any address.
  • The transaction should be initiated with gasPrice set to 0 not to overflow in gas when sending it to the sequencer, as well as this will favor the fact of attacker getting to prove the batch initiated
  • The gasLimit should be precalculated to end up with 0 gas at the end of the execution, this is done for the same reason mentioned above.

The Fix

To fix this issue a constraint must be added for the inNeg polynomial to ensure that it evaluates only to 0 or 1. e.g.

isNeg * (1-isNeg) = 0

References

  1. Hexens Audit Report
  2. Fix Commit

Summary

Related Vulnerabilities: Bad Polynomial Implementation

Identified By: NCC Group

Incorrect polynomial representation resulting from arithmetic operations may break assumptions and lead to erroneous computations or may result in denial of service attacks via Rust panics.

Background

The file fft/polynomial/dense.rs provides an implementation of dense polynomials to be used for FFTs. These polynomials are represented by vectors in which each entry corresponds to a coefficient. These coefficients are elements of a finite field, and as such, the sum of two coefficients may take any value in the range 0, . . . , p − 1, where p is the order of the prime field.

The Vulnerability

When adding two polynomials of the same degree using the function add(), trailing coefficients that sum to zero are not trimmed. This contradicts an underlying assumption on the shape of polynomial representations, namely that the coefficient of the leading term is non-zero.

As an example, summing the polynomials $3 + 2x + x^2$ and $1 + (p − 1)x^2$ (using the function add() provided below for reference) represented by the vectors [3, 2, 1] and [1, 0, p - 1] will result in the vector [4, 2, 0], namely the trailing position is equal to zero.

add():

fn add(self, other: &'a DensePolynomial<F>) -> DensePolynomial<F> {
    if self.is_zero() {
        other.clone()
    } else if other.is_zero() {
        self.clone()
    } else {
        if self.degree() >= other.degree() {
            let mut result = self.clone();
            for (a, b) in result.coeffs.iter_mut().zip(&other.coeffs) {
                *a += b
            }
            result
        } else {
            let mut result = other.clone();
            for (a, b) in result.coeffs.iter_mut().zip(&self.coeffs) {
                *a += b
            }
            // If the leading coefficient ends up being zero, pop it off.
            while result.coeffs.last().unwrap().is_zero() {
                result.coeffs.pop();
            }
            result
        }
    }
}

Interestingly, note that the else-clause in the add() function above does perform this trimming.

While this failure to trim leading zero coefficients is technically not inconsistent with the current polynomial representation (and should not lead to incorrect results), the implementation assumes that all trailing zeros have been trimmed from polynomials.

As a result, functions like degree() (provided below) will panic on unexpected inputs.

degree():

/// Returns the degree of the polynomial.
pub fn degree(&self) -> usize {
    if self.is_zero() {
        0
    } else {
        assert!(self.coeffs.last().map_or(false, |coeff| !coeff.is_zero()));
        self.coeffs.len() - 1
    }
}

This oversight with regards to the trimming of zero coefficients applies to function add_assign(), sub() and sub_assign().

The Fix

Consider performing the “trimming” step of removing trailing zero coefficients from polynomials in all cases after arithmetic operations. Additionally, consider writing unit tests to catch such potential edge cases.

Zendoo Developers introduced a function named truncate_leading_zeros() which removes the leading zero coefficients of a polynomial. This function is now called prior to returning the result of the arithmetic operations add(), add_assign(), sub(), and sub_assign(). As such, this finding has been marked as “Fixed”.

References

  1. NCC Group Audit Report
  2. Fix Commit

Summary

Related Vulnerabilities: Data Validation

Identified By: zkSecurity

input_hash is not a binding commitment to input, which leads to an attacker being able to construct hash collisions via the transaction view key and input.

Background

Aleo employs to "transfer across" inputs from a caller's circuit to a callee's circuit. This transfer mechanism, crucial for the interaction between different circuits, is implemented using the commit-and-prove technique. Specifically, the technique involves:

  1. Both the caller and callee commit to their arguments and expose these commitments as public inputs.
  2. The network (verifier) checks both proofs and ensures that the exposed commitments match.

The core of the implementation in snarkVM is: the caller witnesses and exposes input_ids of the request as public input, which are also exposed by the callee. The network enforces equality of these input_ids.

The Vulnerability

The vulnerability specifically arises in the last step when handling InputID::Private type arguments. In the case of InputID::Private, the input_id is derived by:

InputID::Private:

// A private input is encrypted (using `tvk`) and hashed to a field element.
InputID::Private(input_hash) => {
	// Prepare the index as a constant field element.
	let input_index = Field::constant(console::Field::from_u16(index as u16));
	// Compute the input view key as `Hash(function ID || tvk || index)`.
	let input_view_key = A::hash_psd4(&[function_id.clone(), tvk.clone(), input_index]);
	// Compute the ciphertext.
	let ciphertext = match &input {
			Value::Plaintext(plaintext) => plaintext.encrypt_symmetric(input_view_key),
			// Ensure the input is a plaintext.
			Value::Record(..) => A::halt("Expected a private plaintext input, found a record
input"),
	};
	// Ensure the expected hash matches the computed hash.
	input_hash.is_equal(&A::hash_psd8(&ciphertext.to_fields()))
}
  1. Generating ivk (input_view_key) from tvk (transaction view key).
  2. Encrypting input using input_view_key.
  3. Hashing the resulting ciphertext with the Poseidon hash function.
  4. Constraining the resulting digest to equal input_hash (exposed as a public input).

The crux of the vulnerability is that, for InputID::Private parameters, the input_hash does not form a binding commitment to the input. Given that a malicious prover can choose a different ivk on the caller side, this means that the input on the caller's side can differ from the input on the callee's side, despite generating the same input_id.

For example, the attacker constructs a different ivk' in the caller circuit from the callee, and then provides an input' of his own based on the input in the callee, so that the input' can get the same ciphertext as the callee circuit after encrypting it with the ivk'. From this, the attacker with a different input in the caller circuit can also get the input_hash in the callee circuit.

This manipulation allows a malicious prover to ensure that both the caller and callee circuits produce the same input_id for different inputs, thereby breaking the binding between arguments/inputs across call boundaries in snarkVM.

The Fix

To mitigate this vulnerability, it is recommended to use committing encryption: the ciphertext must form a binding commitment to the plaintext. This can be achieved by enforcing tcm = hash(tvk) and exposing tcm (the transaction commitment) as a public input on the caller's side because (commit(key) enc(key, pt)) is naturally binding.

References

  1. zkSecurity Audit Report
  2. Fix Commit

Summary

Identified By: HashCloak Inc

Operations on shared state within merkle tree do not possess atomicity. This lack of atomicity, especially in a concurrent execution environment, can lead to inconsistencies or security vulnerabilities as multiple operations may attempt to modify the same state simultaneously.

Background

Light Protocol is a privacy protocol on the Solana blockchain that enables users to transfer tokens anonymously. It's built with Groth16 zk-SNARK verifier and Poseidon Merkle tree to ensure the correctness and privacy of token transfers. Users manage encrypted UTXOs on-chain for privacy. The protocol addresses the challenge of executing complex cryptographic computations within Solana's resource-constrained environment by breaking down the execution logic into smaller instructions and safeguarding against potential attacks.

The Vulnerability

There are two types of accounts associated with merkle tree operations:

  1. LightProtocol designs user accounts for users to store encrypted UTXOs onchain,giving them an overview of their private holdings.
  2. tmp_account created by signer.

In LightProtocol, locks are implemented by binding them to the public key of the originator signer. The problem is that in the current implementation, it is possible to execute the Merkle tree command with different temporary storage accounts at the same time, as long as they are initiated by the same signer.

	pubkey_check(
		*signer.key,
		solana_program::pubkey::Pubkey::new(&merkle_tree_pda_data.pubkey_locked),
		String::from("Merkle tree locked by another account."),
	)?;
	merkle_tree_pda_data.time_locked = <Clock as Sysvar>::get()?.slot;
	merkle_tree_pda_data.pubkey_locked = signer.key.to_bytes().to_vec();
	msg!("Locked at slot: {}", merkle_tree_pda_data.time_locked);

It only prevents different signers from modifying the state of the Merkle tree at the same time, but it does not prevent the same signer from using different temporary storage accounts (_tmp_storage_pda in the code from commits 870c07...8d4260 and b85065 ...9eee19) for concurrent operations. Therefore an attacker can utilize different temporary storage accounts with the same initiator identity to perform Merkle tree update operations simultaneously. This means that in a concurrent environment, the state of the Merkle tree (e.g., the filled_subtrees[] array) may be updated without completing all the necessary validation steps, resulting in data consistency and integrity not being guaranteed.

Attack method: an interrupt is inserted at some point during the Merkle tree update process, causing some instructions to be executed while the rest of the instructions are reserved to continue in subsequent operations. The attacker does this by concurrently executing two transactions (each missing the last instruction). Since the lock is only on the initiator's public key and not the temporary storage account, the second transaction can start executing without the first one being completed. This causes the filled_subtrees[] field to be updated with the results of the second transaction, not the first.

The attacker sends the last instruction of the first transaction after the data from the second transaction has been merged into the Merkle tree state. Due to the previous concurrent execution, the filled_subtrees[] field already contains the data from the second transaction, but the system records that the first transaction was paid. Ultimately, an attacker can effectively merge the data of the second transaction into the Merkle tree without paying for it and exploit the UTXO of this transaction.

The Fix

The correct approach should be to bind the lock to the temporary storage account used during execution, ensuring that no concurrent modifications to the Merkle tree state can be made at the same time, even by the same initiator of the operation.

In src/poseidon_merkle_tree/processor.rs, assign to the field pubkey_locked the key of _tmp_storage_pda instead of the key of signer.

	pubkey_check(
		// *signer.key,
		*_tmp_storage_pda.key,
		solana_program::pubkey::Pubkey::new(&merkle_tree_pda_data.pubkey_locked),
		String::from("Merkle tree locked by another account."),
	)?;
	merkle_tree_pda_data.time_locked = <Clock as Sysvar>::get()?.slot;
	// merkle_tree_pda_data.pubkey_locked = signer.key.to_bytes().to_vec();
	merkle_tree_pda_data.pubkey_locked = _tmp_storage_pda.key.to_bytes().to_vec();
	msg!("Locked at slot: {}", merkle_tree_pda_data.time_locked);

Modify the corresponding argument of pubkey_check() as well. This solves the concurrency issue.

	assert_eq!(
		Pubkey::new(&merkle_tree_pda_account_data.pubkey_locked[..]),
		// signer_keypair.pubkey()
		*tmp_storage_pda_pubkey
	);

Also, remove the assignment in lines 65–66 of src/poseidon_merkle_tree/instructions.rs from the function insert_1_inner_loop() to the function insert_last_double(). This ensures atomicity of state change.

References

  1. HashCloak Audit Report
  2. Github Fix
  3. Fix Commit1
  4. Fix Commit2

Summary

Related Vulnerabilities: Under-constrained Circuits

Identified By: Matter Labs Audits

Background

The ZK Email technology allows to initiate on-chain transactions via emails. It achieves this by relying on DKIM signatures and zero-knowledge proofs. On the whole, the ZK Email system comprises five main components:

  • The ZK regex compiler generates a Cicrom circuit from a regular expression.
  • Circom circuits are used to verify the email’s DKIM signature, process the email’s data, and extract public information while keeping the email content confidential to protect any private data it may contain.
  • The DKIM oracle is responsible for fetching DKIM public keys for email providers and producing signed data to be submitted to the DKIM registry smart contract.
  • Solidity smart contracts process ZK proofs alongside the email’s public data. The DKIM registry smart contract among them incorporates public key hashes for whitelisted email providers.
  • The Relayer is responsible for coordinating off-chain components by receiving emails, producing ZK-proofs, and posting them on-chain. The protocol allows for the relayer to be self-hosted which has stronger privacy guarantees.

The Vulnerability

The cornerstone part of the ZK Email system is the ZK regex compiler. It empowers a ZK circuit to support the processing of input signals with beloved and cherished regular expressions. The ZK regexes allow not only to constrain the input to satisfy the regex but additionally to produce a revealing array, exposing the part of the input matching specific sub-regexes.

Let’s look at the concrete example of a regex fed to the ZK regex compiler. This one comprises several sub-regexes and should match the whole dkim-signature header in an email but reveal only the timestamp value for the ;t=ts_value tag in the header.

{
  "parts": [
    {
      "is_public": false,
      "regex_def": "(\r\n|^)dkim-signature:"
    },
    {
      "is_public": false,
      "regex_def": "([a-z]+=[^;]+; )+t="
    },
    {
      "is_public": true,
      "regex_def": "[0-9]+"
    },
    {
      "is_public": false,
      "regex_def": ";"
    }
  ]
}

The ZK Email relies on regular expressions prefixed with (\r\n|^)header_name: for extracting various headers from an email, such as the from, subject, dkim-signature headers.

When we compile a regular expression with the compiler, it generates a Circom circuit that embodies a Deterministic Finite Automaton (DFA) satisfying the regular expression. Any regular expression can be converted into an equivalent DFA. An input string matches the regular expression if it transitions the DFA to the accept state.

Let’s have the ^a regular expression as a toy example. When generating a circuit for the regular expression, the compiler makes a DFA with three states, with state 2 being the accepted state.

What’s more essential for understanding the vulnerability identified is that the compiler injects 255 decimal as the first value to the in array before the user’s input. This one is called an “invalid” decimal, as conceived by the ZK regex compiler’s code, and shouldn’t interfere with the user’s input, indicating the beginning of the string.

var num_bytes = msg_bytes+1;
signal in[num_bytes];
// -->
in[0]<==255;
// <--
for (var i = 0; i < msg_bytes; i++) {
	in[i+1] <== msg[i];
}

Further, in the main loop, the circuit handles the transition between states of the DFA based on the input in[i] and the current state. Initially, the DFA is in state 0 and transitions to state 1 when it receives the 255-decimal input, which denotes the beginning of the string or ^.

for (var i = 0; i < num_bytes; i++) {
	state_changed[i] = MultiOR(2);
	eq[0][i] = IsEqual();
	// -->
	eq[0][i].in[0] <== in[i];
	eq[0][i].in[1] <== 255;
	// <--
	and[0][i] = AND();
	and[0][i].a <== states[i][0];
	and[0][i].b <== eq[0][i].out;
        // -->
	states[i+1][1] <== and[0][i].out;
        // <--
	state_changed[i].in[0] <== states[i+1][1];
	eq[1][i] = IsEqual();
	eq[1][i].in[0] <== in[i];
	eq[1][i].in[1] <== 97;
	and[1][i] = AND();
	and[1][i].a <== states[i][1];
	and[1][i].b <== eq[1][i].out;
	states[i+1][2] <== and[1][i].out;
	state_changed[i].in[1] <== states[i+1][2];
	states[i+1][0] <== 1 - state_changed[i].out;
}

If the subsequent input (the 1st character in the user’s input) isn’t a 97-decimal value (a character), it transitions back to state 0 from state 1.

for (var i = 0; i < num_bytes; i++) {
	state_changed[i] = MultiOR(2);
	eq[0][i] = IsEqual();
	eq[0][i].in[0] <== in[i];
	eq[0][i].in[1] <== 255;
	and[0][i] = AND();
	and[0][i].a <== states[i][0];
	and[0][i].b <== eq[0][i].out;
	states[i+1][1] <== and[0][i].out;
	state_changed[i].in[0] <== states[i+1][1];
	eq[1][i] = IsEqual();
	// -->
	eq[1][i].in[0] <== in[i];
	eq[1][i].in[1] <== 97;
	// <--
	and[1][i] = AND();
	and[1][i].a <== states[i][1];
	and[1][i].b <== eq[1][i].out;
	states[i+1][2] <== and[1][i].out;
	state_changed[i].in[1] <== states[i+1][2];
	// -->
	states[i+1][0] <== 1 - state_changed[i].out;
 	// <--
}

Otherwise, having a character as an input, it transitions from state 1 to state 2, acting as the accepted state.

This implies that one can bypass the regexp by feeding [x, y, z, \xff, a] as an input array. Initially, the DFA transitions from state 0 to state 1 while processing the start of the string. Later, as the x, y, and z input characters are processed, the DFA remains in state 0 since the x character transitioned the DFA back to state 0 from state 1. However, when it receives \xff character as the 4th character, the DFA transitions to state 1, and the subsequent a character input entails the DFA’s transition to the accepted state.

For a further deep dive into the ZK regex compiler, we highly recommend the ZK Regexp technical explainer. Inquisitive readers can visit online ZK Regex Tools to build DFAs and state matrices or obtain Circom circuits for regexes of their choice.

Now, coming back to the vulnerability. We’ve discovered that most email providers blissfully send invalid UTF-8 characters in the subject header of an email, including the \xff character.

subject: \xfffrom: victim@anydomain

With the following curl command line, we’ve managed to send a DKIM-signed email with the \xff character in the subject header through Gmail.

curl -vvv --ssl-reqd \
  --url 'smtp://smtp.gmail.com:587' \
  --user 'attacker@gmail.com:{password}' \
  --mail-from 'attacker@gmail.com' \
  --mail-rcpt 'relayer@domain'\
  --upload-file mail-255.txt

The EmailAuth circuit is in charge of verifying DKIM-signature of the email and extracting the sender’s email address by parsing the from header using the sub-circuits FromAddrRegex, FromAllRegex, and EmailAddrRegex. The FromAllRegex circuit matches the from header in the email by using the following regular expression:

(\r\n|^)from:[^\r\n]+\r\n

Given that the attacker can inject \xff character followed by the victim’s email address, the EmailAuth circuit can be tricked into extracting victim@anydomain email address from the subject header, thinking it comes from the from header.

This ultimately results in an email spoofing attack, allowing the attacker to impersonate any email address by sending emails from attacker@gmail.com with the victim’s email address injected after the \xff character in the subject.

The Fix

The ZK Email team mitigated the vulnerability by introducing a range check for the user’s input in this commit. Currently, all Circom circuits generated by the ZK regex compiler include the following constraints in the initialization part:

signal in_range_checks[msg_bytes];
in[0]<==255;
for (var i = 0; i < msg_bytes; i++) {
	in_range_checks[i] <== LessThan(8)([msg[i], 255]);
	in_range_checks[i] === 1;
	in[i+1] <== msg[i];
}

References

  1. ZK Email: Unveiling Classic Attacks and Why Zero-Knowledge Proofs Alone Are Not a Panacea
  2. ZK Email Security Review Report

Under-constrained circuits do not have all of the required constraints necessary to force proof makers to follow the intended rules of the circuit. Many of the other vulnerabilities listed can make a circuit under-constrained, or it can be due to a missing application logic constraint.

A very basic example of an under-constrained circuit is the following NonIdentityFactors circuit:

template NonIdentityFactors() {
	signal factorOne;
	signal factorTwo;
	signal val;

	val === factorOne * factorTwo;
}

component main{public [val]} = Factors();

This circuit is meant to prove that a prover knows two non identity factors (factors that don’t equal 1) of the public input val. The circuit is under-constrained because there is no constraint forcing the prover to use values other than 1 for factorOne or factorTwo. There is another missing constraint to ensure that factorOne * factorTwo is less than the scalar field order (See 3. Arithmetic Over/Under Flows for more details).

Under-constrained circuits attacks can vary widely depending on the constraints that are missing. In some cases it can lead to significant consequences like allowing a user to repeatedly drain funds (see Bugs in the Wild 5. and 6.). In order to prevent these bugs, it is important to test the circuit with edge cases and manually review the logic in depth. Formal verification tools are almost production ready and will be able to catch a lot of common under-constrained bugs, such as ECNE by 0xPARC and Veridise's circom-coq. Additional tooling to catch these vulnerabilities is in the works as well.

Nondeterministic circuits are a subset of under-constrained circuits, usually because the missing constraints make the circuit nondeterministic. Nondeterminism, in this case, means that there are multiple ways to create a valid proof for a certain outcome. A common example of this is a nondeterministic nullifier generation process.

Nullifiers are often used in with zk applications to prevent double actions. For example, TornadoCash requires a nullifier to be generated and posted on-chain when a note commitment is spent. That way, if the user tries to spend the same note commitment a second time, they will have to post the same nullifier on-chain. Since the nullifier already exists, the smart contract will revert this second spend. The smart contract relies on a valid zk proof to ensure that the nullifier was generated correctly.

Note: Not all nondeterminstic circuits are vulnerable to attacks. In some cases, nondeterminism can allow for an optimized circuit while remaining secure. For example, the circom-pairing library represents field elements as integers A such that 0 <= A < p, but only constrains 0 <= A. So the A < p constraint is left out except for circuits that require it. In the cases where that constraint is not required, overflows will not break the assumptions of the circuit. However, it is still important to be aware of this possibility.

Attack Scenario

In a nondeterministic circuit for proving correct nullifier generation, there are multiple ways to generate a nullifier for the same note commitment. Each nullifier will be unique and posting the nullifier on-chain won’t prevent a double spend. The nondeterministic nature of the nullifier generation process allows users to spend a single note commitment multiple times.

Preventative Techniques

In order to prevent nondeterministic circuits, in-depth manual review of the circuit logic is needed. Constraints need to be added to ensure that the logic is deterministic where necessary. Often times, nondeterministic vulnerabilities can be fixed by adding additional constraints to make the logic deterministic.

Zk cryptography often involves modular arithmetic over a scalar field. This means that all operations are done modulo the order of the field. Circom circuits are built over a scalar field with the following order:

p = 21888242871839275222246405745257275088548364400416034343698204186575808495617

It’s easy to forget this fact, and not account for over/under flows. This means that the following arithmetic statements are true in Circom:

(0 - 1) === 21888242871839275222246405745257275088548364400416034343698204186575808495616;

(21888242871839275222246405745257275088548364400416034343698204186575808495616 + 1) === 0;

This can cause unintended consequences if there are no checks preventing these patterns.

Attack Scenario

For example, consider the following circuit that computes a user’s new balance:

template getNewBalance() {
   signal input currentBalance;
   signal input withdrawAmount;
   signal output newBalance;

   newBalance <== currentBalance - withdrawAmount;
}

If a user inputs a withdrawAmount that is greater than their currentBalance, the newBalance output will underflow and will be an extremely large number close to p. This is clearly not what was intended by the circuit writer.

The fix

We can use the LessThan and Num2Bits templates defined by Circomlib to ensure the values we are working with are within the correct bounds, and will not cause overflows or underflows:

// Ensure that both values are positive.
component n2b1 = Num2Bits(64);
n2b1.in <== withdrawAmount;

component n2b2 = Num2Bits(64);
n2b2.in <== currentBalance;

// Ensure that withdrawAmount <= currentBalance.
component lt = LessThan(64);
lt.in[0] = withdrawAmount;
lt.in[1] = currentBalance + 1;
lt.out === 1;

Here, Num2Bits(n) is used as a range check to ensure that the input lies in the interval [0, 2^n). In particular, if n is small enough this ensures that the input is positive. If we forgot these range checks a malicious user could input a withdrawAmount equal to p - 1. This would satisfy the constraints defined by LessThan as long as the current balance is non-negative since p - 1 = -1 < currentBalance.

Many of CircomLib’s circuits take in an expected number of bits. In order for their constraints and output to be accurate, the input parameters need to be constrained to that maximum number of bits outside of the CircomLib circuit. For example, the LessThan circuit expects n number of bits.

template LessThan(n) {
    assert(n <= 252);
    signal input in[2];
    signal output out;

    component n2b = Num2Bits(n+1);

    n2b.in <== in[0]+ (1<<n) - in[1];

    out <== 1-n2b.out[n];
}

Attack Scenario

The LessThan circuit outputs 1 if in[0] < in[1], and 0 otherwise. An attacker could use in[0] as a small number and in[1] as a number with more than n bits. This would cause the Num2Bits input to underflow, and so the output could be engineered to 0, even though in[0] < in[1].

Preventative Techniques

In order to prevent this, bit length checks should be done on the inputs as well. This can be done by using CircomLib’s Num2Bits circuit. This circuit is already used in the LessThan circuit, but it is used on in[0] + (1 << n) - in[1] instead of on the inputs themselves. So the following Num2Bits constraints should be added as well:

signal input1;
signal input2;
signal maxBits;

// Constrain input1 to maxBits
component bitCheck1 = Num2Bits(maxBits);
bitCheck1.in <== input1;

// Constrain input2 to maxBits
component bitCheck2 = Num2Bits(maxBits);
bitCheck2.in <== input2;

// Compare input1 to input2
component lt = LessThan(maxBits);
lt.in[0] <== input1;
lt.in[1] <== input2;

// Ensure input1 < input2
lt.out === 1;

Many circuits will have a variable as a public input, but won’t write any constraints on that variable. These public inputs without any constraints can act as key information when verifying the proof. However, as of Circom 2.0, the default r1cs compilation uses an optimizer. The optimizer will optimize these public inputs out of the circuit if they are not involved in any constraints.

component UnusedPubInput() {
	signal inOne;
	signal inTwo;
	signal inThree;

	signal output out;

	out <== inOne + inTwo;
}

component main{public [inThree]} = UnusedPubInput();

In the example above, inThree will be optimized out. When submitting a proof to a verifier contract, any value for inThree will succeed on an existing proof.

Attack Scenario

Semaphore is a zero-knowledge application that allows users to prove membership of a group and send signals without revealing their identity. In this case, the signal that a user sends is hashed and included as a public input to the proof. If the Semaphore devs had not included any constraints on this variable, an attacker could take a valid proof of a user, modify the signal hash (public input) and replay the proof with the modified signal hash. This is essentially forging any signal they like.

Preventative Techniques

To prevent this over optimization, one can add a non-linear constraint that involves the public input. TornadoCash and Semaphore do this. TornadoCash used multiple non-linear constraints to prevent its public variables from being optimized out. Semaphore’s public “signalHash” input is intentionally added into a non-linear constraint (”signalHashSquared”) to prevent it from being optimized out. tornado-core/circuits/withdraw.circom

// Add hidden signals to make sure that tampering with recipient or fee will invalidate the snark proof
// Most likely it is not required, but it's better to stay on the safe side and it only takes 2 constraints
// Squares are used to prevent optimizer from removing those constraints
signal recipientSquare;
signal feeSquare;
signal relayerSquare;
signal refundSquare;
recipientSquare <== recipient * recipient;
feeSquare <== fee * fee;
relayerSquare <== relayer * relayer;
refundSquare <== refund * refund;

Semaphore/semaphore.circom

// nLevels must be < 32.
template Semaphore(nLevels) {
  signal input identityNullifier;
  signal input identityTrapdoor;
  signal input treePathIndices[nLevels];
  signal input treeSiblings[nLevels];

  signal input signalHash;
  signal input externalNullifier;

  signal output root;
  signal output nullifierHash;

  component calculateSecret = CalculateSecret();
  calculateSecret.identityNullifier <== identityNullifier;
  calculateSecret.identityTrapdoor <== identityTrapdoor;

  signal secret;
  secret <== calculateSecret.out;

  component calculateIdentityCommitment = CalculateIdentityCommitment();
  calculateIdentityCommitment.secret <== secret;

  component calculateNullifierHash = CalculateNullifierHash();
  calculateNullifierHash.externalNullifier <== externalNullifier;
  calculateNullifierHash.identityNullifier <== identityNullifier;

  component inclusionProof = MerkleTreeInclusionProof(nLevels);
  inclusionProof.leaf <== calculateIdentityCommitment.out;

  for (var i = 0; i < nLevels; i++) {
      inclusionProof.siblings[i] <== treeSiblings[i];
      inclusionProof.pathIndices[i] <== treePathIndices[i];
  }

  root <== inclusionProof.root;

	// Dummy square to prevent tampering signalHash.
	signal signalHashSquared;
	signalHashSquared <== signalHash * signalHash;

	nullifierHash <== calculateNullifierHash.out;
}

component main {public [signalHash, externalNullifier]} = Semaphore(20);

If a zero-knowledge proof protocol is insecure, a malicious prover can forge a zk proof that will succeed verification. Depending on the details of the protocol, the forged proof can potentially be used to “prove” anything the prover wants. Additionally, many zk protocols use what is known as a Fiat-Shamir transformation. Insecure implementations of the Fiat-Shamir transformation can allow attackers to successfully forge proofs.

These types of vulnerabilities have been named “Frozen Heart” vulnerabilities by the TrailOfBits team. From their article (linked in references below):

We’ve dubbed this class of vulnerabilities Frozen Heart. The word frozen is an acronym for FoRging Of ZEro kNowledge proofs, and the Fiat-Shamir transformation is at the heart of most proof systems: it’s vital for their practical use, and it’s generally located centrally in protocols. We hope that a catchy moniker will help raise awareness of these issues in the cryptography and wider technology communities. - Jim Miller, TrailOfBits

Many zk protocols are first developed as interactive protocols, where the prover and verifier need to send messages in multiple rounds of communication. The verifier must send random numbers to the prover, that the prover is unable to predict. These are known as "challenges". The prover must then compute their components of the proof based on these challenges. This requirement for multiple rounds of communication would greatly limit these protocol's practicality. The most common way around this is to use the Fiat-Shamir transformation that supports what is known as the "random oracle model". Essentially, the prover can automatically get the challenges by hashing certain inputs of the proof instead of waiting on the verifier for the challenges. The hashes of the proof inputs acts as a random oracle. This allows the communication to happen within 1 round - the prover generates the proof and sends it to the verifier. The verifier then outputs accept or reject.

As explained in the TrailOfBits’ explanation (linked in the references below), these bugs are widespread due to the general lack of guidance around implementing the Fiat-Shamir transformation for different protocols. Often times the protocols are implemented directly from the academic paper that first introduced the protocol. However, these papers do not include all of the necessary details to be weary of when writing it in code. TrailOfBits’ suggested solution is to produce better implementation guidance for developers, which is why they created ZKDocs.

References

  1. TrailOfBits explanation on the Frozen Heart bugs they discovered

Many popular zk protocols require what is known as a trusted setup. The trusted setup is used to generate the parameters necessary for a prover to create sound zk proofs. However, the setup also involves parameters that need to be hidden to everyone. These are known as "toxic waste". If the toxic waste is revealed to a party, then they would be able to forge zk proofs for that system. The toxic waste is usually kept private in practice through the use of multi-party computation.

Older zk protocols such as Pinocchio and Groth16 require a new trusted setup for each unique circuit. This creates problems whenever a project needs to update their circuits because then they would need to redo the trusted setup. Newer protocols such as MARLIN and PLONK still require a trusted setup, but only once. These protocols' setup is known as "universal" since it can be used for multiple programs, making upgrades much easier. Other protocols such as Bulletproofs and STARKs do not require trusted setups at all.

See the references below for more details on toxic waste and how trusted setups work.

References

  1. Zcash Parameter Generation
  2. How do trusted setups work? - Vitalik
  3. Setup Ceremonies
  4. Zcash Soundness Bug

A very common bug and misunderstanding is the difference between assignments and constraints. For zk circuits, constraints are the mathematical equations that must be satisfied by any given inputs for a proof to be valid. If any of the inputs result in one of the constraint equations to be incorrect, a valid proof is not possible (well, extremely unlikely).

Assignments, on the other hand, simply assign a value to a variable during proof generation. Assignments do not have to be followed for a valid proof to be created. Often times, an assignment can be used, in combination with other constraints, to reduce the total number of constraints used.

Constraints actually add equations to the R1CS file whereas assignments do not.

Circom Case and Example

A great example that highlights the difference between assignment and constraint is the IsZero circom circuit:

pragma circom 2.0.0;

template IsZero() {
    signal input in;
    signal output out;
    signal inv;
    inv <-- in!=0 ? 1/in : 0;
    out <== -in*inv +1;
    in*out === 0;
}

component main {public [in]}= IsZero();

In circom, the <--, --> and = operators are assignments, whereas the <==, ==>, and === operators are constraints. So, inv is assigned 1/in if in is not 0 and is assigned 0 otherwise. This is not a constraint - a prover can assign whatever they want to inv as long as it satisfies the constraints. We will see, however, that even if the prover inputs values other than the expected assignment for inv, IsZero will still be constrained to output 1 if in == 0 and 0 otherwise.

There are 2 cases to consider, in is zero and in is nonzero. If in is zero, then the first constraint out <== -in*inv + 1 forces out == 1, and the second constraint in*out === 0 is satisfied. So the circuit acts as expected, no matter what the prover inputs for inv. For the case that in != 0, then by the second constraint, out must be 0. Therefore the first constraint must force that -in*inv + 1 == 0, and so in*inv == 1. This effectively constrains inv to equal the inverse of in as expected. Even though inv is assigned at will by the prover, the added constraints force it to actually be the inverse of in when in != 0. The circuit acts as expected for both cases of in == 0 and when in != 0.

A good question is why don't we remove the first constraint and simplify the code to the following:

template IsZero() {
    signal in;
    signal out;
    signal temp;
    temp <-- in!= 0 ? 0 : 1;
    out === temp;
}

The problem here is that temp is assigned to in != 0 ? 0 : 1 and not actually constrained. So a trustworthy prover will always run this code and generate successful proofs, but a malicious prover can alter the code to assign anything to temp. A malicious prover could copy this circuit and change the temp assignment to temp <-- 1; so that the circuit always outputs 1. Their proofs generated from their slightly changed code would successfully pass a verifier built from this example circuit. We need the out <== -in*temp +1; constraint to ensure that temp is actually the inverse of in.

Another good question is why don't we constrain instead of assign: inv <== in!=0 ? 1/in : 0;. This is because circom does not support the automatic constraint of a ternary operator. It adds more than one constraint behind the scenes which affects performance. For example, the Mux1 circuit is often used in place of the ternary operator, but it adds more than one constraint. So the authors of the IsZero circuit have found a creative way to reduce the constraint count.

Halo2 Case

This type of bug can easily happen with Halo2 circuits as well - maybe even more likely because assigning is done separately from constraints. In Halo2, often times the constraints are created in a configure function, while the assignments are done in an assign function. It is easy for developers to see that the circuit is satisfied because the assign function correctly gives the desired inputs. However, if the configure function is missing a constraint, a malicious prover can fork the code and change the assign function to target the missing constraint.

A good example of this bug is from the SHL and SHR opcode circuit for the PSE zkEVM. The opcode circuit takes as input a shift variable and a shf0 variable:

let shift = cb.query_word_rlc();
let shf0 = cb.query_cell();

Shf0 is assumed to be the first byte of the shift variable. There are later constraints added on shf0 instead of shift because it is cheaper constraint-wise to only consider the first byte. However, there was a missing constraint forcing shf0 to be the first byte of shift. It is easy to miss this because shf0 and shift are properly assigned in the assign function:

let shf0 = pop1.to_le_bytes()[0];
self.shift.assign(region, offset, Some(pop1.to_le_bytes()))?;

The fix was to add a constraint forcing shf0 to equal the first byte of shift.

Preventative Techniques

To prevent these types of bugs, it is important to understand the ins and outs of whatever coding language was used to create the circuits. That way one can quickly identify what's constrained and what's assigned. Additionally, it is extremely important to go over each constraint in detail to ensure that the system is constrained to the exact expected behavior. Do not consider assignments when doing this, only the constraints. A great way to ensure your constraints are sufficient is through formal verification. Some tools are currently in the process that will help enable quick formal verification for circom and halo2 circuits.

References

  1. Circom Documentation
  2. CircomLib IsZero Circuit
  3. PSE zkEVM Shf0 Bug

This repo was inspired by a few other github repos that also document common vulnerabilities (but for solidity) -

  1. "Security of ZKP projects: same but different" by JP Aumasson @ Taurus. Great slides outlining the different types of zk security vulnerabilities along with examples.
  2. Security Reviews of ZK Protocols by nullity. Consists of Security Reports of 20+ ZK Protocols.
  3. Circomspect by TrailOfBits. A static analyzer for circom code to help detect vulnerabilities. The TrailOfBits introduction post for this tool is a great read.
  4. Picus by Veridise. A static analysis tool for ZKP circuits implemented in Circom (or anything that compiles to R1CS). Lecture on Picus & formal verification by Yu Feng from MOOC 2023.
  5. Ecne by Franklyn Wang. Automated Verification of ZK Circuit, good for finding uniqueness/under-constraint bugs in circuit to QAP/R1CS conversion. A good explanation of Ecne & under constrained circuits is given in this 0xPARC's blog.
  6. Coda by Veridise. As per the CODA paper, Coda makes it possible to formally specify and statically check properties of a ZK application through a rich refinement type system.
  7. Korrekt by Quantstamp is a proof of concept for checking correctness of Halo2 circuits as described in this paper. Talks on linting Halo2 circuits at ZKSummit9 & ETHDenver.
  8. ZK-EVM Audit education sessions by Scroll & Polygon.