DLT Interoperability and More ⛓️#30— zk-Bench: A Toolset for Comparative Evaluation and Performance Benchmarking of SNARKs⛓️

6 min readOct 23, 2023

In this series, we analyze papers on blockchain and interoperability.

This edition covers a framework comparing tools that generate SNARKs. Yet another great paper by Jens Ernstberger and colleagues.

➡️ Title: zk-Bench: A Toolset for Comparative Evaluation and Performance Benchmarking of SNARKs

➡️ Authors: Jens Ernstberger, Stefanos Chaliasos, George Kadianakis, Sebastian Steinhorst, Philipp Jovanovic, Arthur Gervais, Benjamin Livshits, Michele Orrù

➡️ Paper source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4563364

➡️ Motivation:

Zero-knowledge (ZK) technology is here to stay. Recently, we have been exploring SNARKs in the context of blockchain interoperability with our DendrETH project. In particular, we implemented the Altair Light Client protocol as a Circom circuit. This circuit verifies that all the rules of the light client protocol are correctly applied (input: previous and current block headers). We used the Groth16 cryptographic system, “a specific type of SNARK”. Provided the constraints are respected, the output is a SNARK that can be verified on-chain.

The idea is then to use these on-chain SNARK verifiers in different chains to verify the block headers from the Ethereum network — this allows proving facts on the Ethereum network (balances, transaction inclusion, even validators balance) on third-party chains. We developed a prototype supporting both EVM and non-EVM chains, and provided a comprehensive framework to build secure cross-chain applications. Paper about to come out!

This paper is extremely timely and relevant for us because it allows us to assess the performance of different cryptographic systems we can leverage to implement future, more efficient versions of DendrETH that we use as the core of our paper Harmonia — stay tuned for its publication.

From the introduction of zkBench: “Tools and libraries for ZKPs are typically constructed in multiple layers, encompassing implementations of finite fields, elliptic curves, specific models of computation (such as Rank-One Constraint Systems (R1CS) [9], Quadratic Arithmetic Programs (QAP) [55], Plonkish, and more), polynomial and vector algebra, public-key cryptography, and the actual proof system. Together, these components provide diverse functionality and performance trade-offs, and may further be customized or optimized.”. This gives us a glimpse of what is going to be analyzed.

➡️ Contributions:

zk-Bench, the first benchmarking framework designed for performance evaluation of general-purpose zero-knowledge proof (ZKP) low-level operations across different libraries and curves. Furthermore, zkBench also evaluates high-level ZK circuits (including setup, proving, and verification).

➡️ Analysis:

To generate a SNARK, we have a few steps that need to be followed. The following figure, from the paper, exemplifies the architecture and the different steps:

Source: https://eprint.iacr.org/2023/1503

First, a circuit is written in (typically) a high-level programming language. Those circuits are represented in a lower-level format, typically a R1CS (ranked one constraint system). A R1CS is a system of quadratic equations used to represent arithmetic circuits. Circuits represent relations (R). From the paper, “An arithmetic circuit supports addition and multiplication gates, with unrestricted fan-in and unrestricted fan-out […] . Additional tooling allows developers to specify the circuit through a domain-specific programming language or a SNARK library.”
A setup algorithm is run to pre-process “R”, and reduce the computational load of the verifier. R is called the relation, which defines a relationship between instances and witnesses such that a specific instance-witness pair can be verified in polynomial time. It effectively characterizes the problem being solved or the statement being proven. The setup outputs a proving key pk and a verification key vk.
The inputs to the circuit (implementing “R”) can be public or private; generating a proof (SNARK) can be done by running a Prove algorithm that takes an instance “x” (the inputs to the circuit) and a witness “w” (solution when applied to the circuit along with the instance, produce a desired output, demonstrating the truth of a particular statement), and returns a proof. The key point for the zero-knowledge property is that the instance “x” does not show the private inputs.
Now, the verifier can run a verify algorithm that takes vk, x, and the proof and outputs true if the proof is valid and false otherwise.

The authors also offer a way to estimate the runtime of complex proof systems” to “offer accurate estimation results even if we lack benchmarks for the requested size, or when the cryptographic protocol is in the research stage and not yet implemented”. This is done in a simple way. For arithmetic estimation:

“If the user queries for a size that lies between two benchmarked intervals, we perform linear interpolation of the two estimates and return the value at the given size.”
“If the requested size is outside the tested intervals, we apply linear regression using the least squares method on the nearest four samples, and return the value of the resulting line at the given size.”

For the circuits, the estimation builds on top of the previous estimation: “ Given the size of the circuit being proven (in the case of R1CS for instance, the number of addition gates, multiplication gates, instance, and witness sizes) we can provide estimates for the running time of a proof system by enumerating the arithmetic operations being performed, estimating the run-time for each of them, and summing them up”. The estimator’s code is available here.

💪 Strong points:

First large-scale empirical study on ZK tooling. I predict this framework to have great adoption.
The authors have deployed an interactive website that shows the time predictions for a specific operation, using a specific curve and library: https://zka.lc/
https://github.com/zkCollective/zk-Harness contains the source code for benchmarking circuits

🤞 Suggestions for improvement:

While the circuit benchmarks are already very interesting by themselves, we do not have a clear idea of the resources a real-world application circuits take. It would be interesting to benchmark for example, our circuits at https://github.com/metacraft-labs/DendrETH/tree/main/beacon-light-client/circom/circuits.
As pointed out by the authors, increasing the “Scope of Evaluated Circuits” would be very interesting. What are common optimizations that different libraries do? How do those translate into performance improvements? In particular, it would be interesting for us to have “optimizations” benchmarks.

🔥 Points of interest:

“we show that the arithmetic benchmarks can be used to (i) accurately extrapolate the performances of complex arithmetic operations and (ii) approximate the runtime of a ZKP circuit implementation — without actually implementing it.” — right in the introduction, the authors provide an interesting insight. One could predict the runtime of a circuit based on the arithmetic library benchmarks. This would allow to fine-tune the hardware setup.
“We find that for BN254, gnark-crypto is on average 30% faster than any other implementation for addition and multiplication in the scalar field”
“The choice of hardware significantly impacts the execution time. ”
“More concretely, for a MSM instance of size 2²⁰,[ …] over BN254”, , gnark-crypto is the fastest. “This is expected, as gnark-crypto implements the state-of-the-art MSM, a variant of the Pippenger algorithm”.
“The low-level arithmetic operations are faster for more powerful CPUs (i.e., more CPU cores) in the first three rows. The last row depicts execution on an ARM-based machine, where it can be seen that executing finite field arithmetic is significantly less performant than on x86.”
“the setup phase for the P𝔩𝔬𝔫K proof system is faster than that of Groth16.”
the slowest prover on generating a proof for “a SHA-256 circuit for preimage” verification is circom/snarkjs, while the fastest is circom/rapidsnark (for groth16).
“Both snarkjs and rapidsnark demonstrate improved performance on RAM-optimized machines across all circuits.”
“A more pronounced difference is evident in gnark’s performance on CPU-optimized machines. Its superior performance might indicate efficient parallelization, most notably seen in the SHA256 circuit”

🚀 What are the implications for our work?

Benchmarking different frameworks and proof systems allows us to tailor future versions of our DendrETH project that tackles secure, reliable cross-chain interoperability due to insights on performance.

DLT Interoperability and More ⛓️#30— zk-Bench: A Toolset for Comparative Evaluation and Performance Benchmarking of SNARKs⛓️

Written by Rafael Belchior