Bitlayer Research: Binius STARKs principle analysis and optimization thinking

2024-10-22 18:02

Read this article in 50 Minutes

The bit widths of the 1st, 2nd, and 3rd generation STARK proof systems are 252, 64, and 32 bits respectively. Although the coding efficiency has been improved, there is still wasted space. Binius directly operates on bits, and the coding is compact and efficient. It is likely to be the fourth generation of STARK in the future. Binius uses arithmetic based on the tower binary field, an improved version of HyperPlonk product and permutation check, small field polynomial commitment and other technologies to improve efficiency from all angles. Further optimization can be made in binary field multiplication, ZeroCheck, SumCheck, PCS, etc. to further improve the proof speed and reduce the proof size.

Original title: Binius STARKs Analysis and Its Optimization
Original author: mutourend & lynndell, Bitlayer Labs

1 Introduction

STARKs can be considered as hash-based SNARKs, unlike elliptic curve-based SNARKs. One of the main reasons for the low efficiency of current STARKs is that most of the values in actual programs are small, such as indexes, true and false values, counters, etc. in for loops. However, in order to ensure the security of Merkle tree-based proofs, when Reed-Solomon encoding is used to expand the data, many additional redundant values will occupy the entire domain, even if the original value itself is very small. To solve this problem, reducing the size of the domain has become a key strategy.

As shown in Table 1, the encoding bit width of the first generation STARKs is 252 bits, the encoding bit width of the second generation STARKs is 64 bits, and the encoding bit width of the third generation STARKs is 32 bits, but the 32-bit encoding bit width still has a lot of wasted space. In contrast, the binary field allows direct bit operations, and the encoding is compact and efficient without arbitrary wasted space, that is, the fourth generation STARKs.

Table 1: STARKs evolution path

Compared to the finite fields discovered by recent studies such as Goldilocks, BabyBear, and Mersenne31, the research on binary fields can be traced back to the 1980s. Currently, binary domains have been widely used in cryptography, with typical examples including: · Advanced Encryption Standard (AES), based on the F28 domain; · Galois Message Authentication Code (GMAC), based on the F2128 domain; · QR code, using Reed-Solomon encoding based on F28; · The original FRI and zk-STARK protocols, as well as the Grøstl hash function that entered the SHA-3 finals, which is based on the F28 domain and is a hash algorithm that is very suitable for recursion.

When using smaller domains, extended domain operations become increasingly important for ensuring security. The binary domain used by Binius relies entirely on the extended domain to ensure its security and practical usability. Most polynomials involved in Prover calculations do not need to enter the extended domain, but only need to operate in the base domain, thus achieving high efficiency in small domains. However, random point checks and FRI calculations still need to go deeper into larger extended domains to ensure the required security.

When building a proof system based on a binary domain, there are 2 practical problems: when calculating the trace representation in STARKs, the domain size used should be larger than the order of the polynomial; when committing to the Merkle tree in STARKs, Reed-Solomon encoding is required, and the domain size used should be larger than the size after the encoding is expanded.

Binius proposed an innovative solution to deal with these two problems separately and to achieve it by representing the same data in two different ways: first, using multivariate (specifically multilinear) polynomials instead of univariate polynomials, and representing the entire computational trajectory through its values on "hypercubes"; second, since the length of each dimension of the hypercube is 2, it is not possible to perform a standard Reed-Solomon expansion like STARKs, but the hypercube can be regarded as a square, and the Reed-Solomon expansion can be performed based on the square. This method greatly improves coding efficiency and computing performance while ensuring security.

2 Principle Analysis

The construction of most current SNARKs systems usually includes the following two parts:

· Information-Theoretic Polynomial Interactive Oracle Proof (PIOP):PIOP, as the core of the proof system, converts the input computational relationship into a verifiable polynomial equation. Different PIOP protocols allow the prover to send polynomials step by step through interaction with the verifier, so that the verifier can verify whether the calculation is correct by querying the evaluation results of a small number of polynomials. Existing PIOP protocols include: PLONK PIOP, Spartan PIOP and HyperPlonk PIOP, etc. They each handle polynomial expressions differently, thus affecting the performance and efficiency of the entire SNARK system.

· Polynomial Commitment Scheme (PCS):The polynomial commitment scheme is used to prove whether the polynomial equation generated by PIOP is true. PCS is a cryptographic tool that allows the prover to commit to a polynomial and later verify the evaluation result of the polynomial while hiding other information of the polynomial. Common polynomial commitment schemes include KZG, Bulletproofs, FRI (Fast Reed-Solomon IOPP) and Brakedown. Different PCS have different performance, security and applicable scenarios.

Depending on the specific needs, different PIOPs and PCSs can be selected and combined with appropriate finite fields or elliptic curves to build proof systems with different properties. For example:

• Halo2:Combined with PLONK PIOP and Bulletproofs PCS, and based on Pasta curve. Halo2 was designed with a focus on scalability and removing trusted setup from the ZCash protocol.

• Plonky2:Combined with PLONK PIOP and FRI PCS, and based on Goldilocks field. Plonky2 is designed to achieve efficient recursion. When designing these systems, the selected PIOP and PCS must match the finite field or elliptic curve used to ensure the correctness, performance, and security of the system. The choice of these combinations not only affects the proof size and verification efficiency of SNARK, but also determines whether the system can achieve transparency without trusted setup and whether it can support extended functions such as recursive proofs or aggregate proofs.

Binius:HyperPlonk PIOP + Brakedown PCS + Binary Fields. Specifically, Binius includes five key technologies to achieve its efficiency and security. First, arithmetic based on towers of binary fields forms the basis of its calculation, which can realize simplified operations in binary fields. Second, Binius adapted the HyperPlonk product and permutation check in its interactive oracle proof protocol (PIOP) to ensure secure and efficient consistency checks between variables and their permutations. Third, the protocol introduces a new multilinear shift argument to optimize the efficiency of verifying multilinear relations on small domains. Fourth, Binius uses an improved version of the Lasso search argument , which provides flexibility and strong security for the search mechanism. Finally, the protocol uses the Small-Field Polynomial Commitment Scheme (Small-Field PCS), which enables it to implement efficient proof systems on binary fields and reduces the overhead typically associated with large fields.

2.1 Finite Fields: Arithmetic Based on Towers of Binary Fields

Pylon-shaped binary fields are key to enabling fast verifiable computations, primarily due to two aspects: efficient computation and efficient arithmetic. Binary fields inherently support highly efficient arithmetic operations, making them ideal for performance-sensitive cryptographic applications. Furthermore, the binary field structure supports simplified arithmetic operations, i.e., operations performed on the binary field can be represented in a compact and easily verifiable algebraic form. These properties, combined with the ability to fully exploit their hierarchical nature through the tower structure, make binary fields particularly suitable for scalable proof systems such as Binius.

Where "canonical" refers to a unique and direct representation of an element in the binary field. For example, in the most basic binary field F2, any k-bit string can be directly mapped to a k-bit binary field element. This is different from prime fields, which do not provide such a canonical representation within a given number of bits. Although a 32-bit prime field can be contained in 32 bits, not every 32-bit string can uniquely correspond to a field element, while the binary field has the convenience of this one-to-one mapping. In the prime field Fp, common reduction methods include Barrett reduction, Montgomery reduction, and special reduction methods for specific finite fields such as Mersenne-31 or Goldilocks-64. In the binary field F2k, common reduction methods include special reductions (such as used in AES), Montgomery reduction (such as used in POLYVAL), and recursive reductions (such as Tower). The paper "Exploring the Design Space of Prime Field vs. Binary Field ECC-Hardware Implementations" points out that the binary field does not need to introduce carry in addition and multiplication operations, and the square operation of the binary field is very efficient because it follows the simplified rule of (X + Y)2 = X2 + Y2.

As shown in Figure 1, a 128-bit string: This string can be interpreted in many ways in the context of the binary field. It can be regarded as a unique element in the 128-bit binary field, or parsed into two 64-bit tower field elements, four 32-bit tower field elements, 16 8-bit tower field elements, or 128 F2 field elements. This flexibility of representation does not require any computational overhead, just typecasting of the bit string, which is a very interesting and useful property. At the same time, small field elements can be packed into larger field elements without additional computational overhead. The Binius protocol takes advantage of this feature to improve computational efficiency. In addition, the paper "On Efficient Inversion in Tower Fields of Characteristic Two" explores the computational complexity of multiplication, squaring, and inversion operations in an n-bit tower binary field (which can be decomposed into m-bit subfields).

Figure 1: Tower Binary Domain

2.2 PIOP: Adapted HyperPlonk Product and PermutationCheck - Applicable to Binary Domain

The PIOP design in the Binius protocol draws on HyperPlonk and adopts a series of core checking mechanisms to verify the correctness of polynomials and multivariate sets. These core checks include:

1. GateCheck:Verify whether the secret witness ω and the public input x satisfy the circuit operation relation C(x,ω)=0 to ensure that the circuit operates correctly.

2.PermutationCheck：Verifies that the evaluation of two multivariate polynomials f and g on a Boolean hypercube results in a permutation relation f(x) = f(π(x)), ensuring consistency between the polynomial variables.

3.LookupCheck：Verifies that the evaluation of the polynomial is in a given lookup table, i.e. f(Bµ) ⊆ T(Bµ), ensuring that certain values are within a specified range.

4.MultisetCheck：Checks that two multivariate sets are equal, i.e. {(x1,i,x2,)}i∈H={(y1,i,y2,)}i∈H, ensuring consistency between multiple sets.

5.ProductCheck：Check whether the evaluation of a rational polynomial on a Boolean hypercube is equal to a certain declared value ∏x∈Hµ f(x) = s to ensure the correctness of the polynomial product.

6.ZeroCheck：Verify whether a multivariate polynomial is zero at any point on a Boolean hypercube ∏x∈Hµ f(x) = 0,∀x ∈ Bµ to ensure the zero point distribution of the polynomial.

7.SumCheck：Check whether the sum of a multivariate polynomial is the declared value ∑x∈Hµ f(x) = s. By converting the evaluation problem of a multivariate polynomial into the evaluation of a univariate polynomial, the computational complexity of the verifier is reduced. In addition, SumCheck also allows batch processing, which can realize batch processing of multiple sum verification instances by introducing random numbers and constructing linear combinations.

8.BatchCheck: Based on SumCheck, it verifies the correctness of multiple multivariate polynomial evaluations to improve protocol efficiency.

Although Binius and HyperPlonk have many similarities in protocol design, Binius makes improvements in the following 3 aspects:

· ProductCheck optimization: In HyperPlonk, ProductCheck requires that the denominator U is non-zero everywhere on the hypercube and the product must be equal to a specific value; Binius simplifies this check process by specializing this value to 1, thereby reducing computational complexity.

· Handling of division by zero:HyperPlonk failed to adequately handle the division by zero case, resulting in the inability to assert that U is non-zero on the hypercube; Binius correctly handles this problem, and Binius's ProductCheck can continue to handle it even when the denominator is zero, allowing it to be generalized to arbitrary product values.

· Cross-column PermutationCheck:HyperPlonk does not have this feature; Binius supports PermutationCheck between multiple columns, which enables Binius to handle more complex polynomial permutations.

Therefore, Binius has improved the flexibility and efficiency of the protocol by improving the existing PIOPSumCheck mechanism, especially providing stronger functional support when handling more complex multivariate polynomial verification. These improvements not only address limitations in HyperPlonk, but also lay the foundation for future proof systems based on binary fields.

2.3 PIOP: New multilinear shift argument - for boolean hypercube

In the Binius protocol, the construction and processing of virtual polynomials is one of the key techniques, which can efficiently generate and operate polynomials derived from input handles or other virtual polynomials. The following are two key methods:

· Packing:This method optimizes operations by packing smaller elements in adjacent positions in lexicographic order into larger elements. The Pack operator operates on blocks of size 2κ and combines them into a single element in the high-dimensional domain. Through multilinear extension (MLE), this virtual polynomial can be efficiently evaluated and processed, transforming the function t into another polynomial, thereby improving the computational performance.

Shift Operator: The shift operator rearranges the elements within a block, performing a circular shift based on a given offset o. This approach works for blocks of size 2b, where each block is shifted by an offset. The shift operator is defined via a check function, ensuring consistency and efficiency when dealing with virtual polynomials. The complexity of evaluating this construction grows linearly with the block size, making it particularly useful for dealing with large datasets or high-dimensional scenarios such as Boolean hypercubes. 2.4 PIOP: An Adapted Lasso Lookup Argument for Binary Domains The Lasso protocol allows a prover to commit to a vector a ∈ Fm and prove that all its elements exist in a pre-specified table t ∈ Fn. Lasso unlocks the concept of “lookup singularities” and can be applied to multilinear polynomial commitment schemes. Its efficiency is reflected in the following two aspects:

· Proof efficiency:For m lookups in a table of size n, the prover only needs to commit to m+n domain elements. These domain elements are small and all lie in the set {0,...,m}. In a commitment scheme based on multiple exponentiation, the computational cost of the prover is O(m + n) group operations (such as elliptic curve point addition), plus the evaluation cost of proving whether a multilinear polynomial is a table element on a Boolean hypercube.

· No need to commit to large tables:If the table t is structured, no commitment is required to it, so very large tables (such as 2128 or larger) can be processed. The running time of the prover is only related to the table entries visited. For any integer parameter c> 1, the main cost of the prover is the proof size, and the number of committed domain elements is 3·c m + c·n1/c. These domain elements are small and lie in the set {0,...,max{m,n1/c,q} − 1}, where q is the maximum value in a.

The Lasso protocol consists of the following three components:

· Virtual polynomial abstraction for large tables:Operations on large tables are implemented by combining virtual polynomials, ensuring efficient search and processing within the table.

· Small table search:The core of Lasso is the small table search, which is the core construction of the virtual polynomial protocol. Offline memory detection is used to verify whether the evaluation of a virtual polynomial on a Boolean hypercube is a subset of the evaluation of another virtual polynomial. This search process is reduced to the task of multi-set detection.

· Multi-set Check:Lasso introduces a virtual protocol to perform multi-set checks, verifying that the elements of two sets are equal or satisfy a certain condition.

Binius protocol adapts Lasso to operations in binary domains, assuming that the current domain is a prime field with large characteristics (much larger than the length of the column being searched). Binius introduces a multiplication version of the Lasso protocol, requiring the prover and the verifier to jointly increment the "memory count" operation of the protocol, not by simply adding 1, but by multiplication generators in the binary domain. However, this multiplication adaptation introduces more complexity. Unlike the increment operation, the multiplication generator is not incremented in all cases. There is a single track at 0, which may become an attack point. To prevent this potential attack, the prover must commit to a read count vector that is non-zero everywhere to ensure the security of the protocol.

2.5 PCS: Adapted Brakedown PCS - Applicable to Small-Field

The core idea of building BiniusPCS is packing. The Binius paper provides two Brakedown polynomial commitment schemes based on binary fields: one is instantiated using concatenated code; the other uses block-level encoding technology and supports the use of Reed-Solomon codes alone. The second Brakedown PCS scheme simplifies the proof and verification process, but the proof size is slightly larger than the first one. However, the simplification and implementation advantages brought about by this trade-off are worth making.

Binius polynomial commitment mainly uses small field polynomial commitment and extended field evaluation, small field universal construction and block-level encoding and Reed-Solomon code technology.

Small Field Polynomial Commitment and Extended Field Evaluation:Commitments in the Binius protocol are polynomial commitments over a small field K and evaluated over a larger extended field L/K. This approach ensures that every multilinear polynomial t(X0,...,Xℓ−1) belongs to the field K[X0,...,Xℓ−1], while the evaluation point can be in the larger extended field L. The commitment scheme is specifically designed for small field polynomials and can be queried over the extended field, while guaranteeing both security and efficiency of the commitment.

Small Field Universal Construction:The small field universal construction ensures that the extended field L is large enough to support secure evaluation by defining the parameter ℓ, the field K, and its associated linear block code C. To improve security while maintaining computational efficiency, the protocol guarantees the robustness of the commitment through the properties of the extended field and by encoding the polynomials with linear block codes.

Block-level encoding and Reed-Solomon codes:Binius proposed a block-level encoding scheme for polynomials with fields smaller than the alphabet of linear block codes. This scheme allows even polynomials defined in small fields such as F2 to be efficiently committed using Reed-Solomon codes with large alphabets such as F216. Reed-Solomon codes were chosen for their efficiency and maximum distance separation properties. The scheme simplifies the operation by bundling messages and encoding them row by row before using Merkle trees for commitment. Block-level encoding allows efficient commitment of polynomials over small fields without the high computational overhead typically associated with large fields, making it possible to commit polynomials in small fields such as F2 while maintaining computational efficiency in generating proofs and verifications.

3 Optimization Thinking

In order to further improve the performance of the Binius protocol, this paper proposes four key optimization points:

1. GKR-based PIOP:For binary domain multiplication operations, the GKR protocol is used to replace the Lasso Lookup algorithm in the Binius paper, which can greatly reduce the commitment overhead of Binius;

2. ZeroCheck PIOP optimization:The computational overhead trade-off between Prover and Verifier makes ZeroCheck operations more efficient;

3. Sumcheck PIOP optimization:For the optimization of small domain Sumcheck, the computational burden on small domains is further reduced;

4. PCS optimization:Through FRI-Binius Optimization, reducing proof size and improving the overall performance of the protocol.

3.1 GKR-based PIOP: Binary field multiplication based on GKR

The Binius paper introduces a lookup-based scheme to achieve efficient binary field multiplication operations. The binary field multiplication algorithm adapted by the Lasso lookup argument relies on the linear relationship between lookups and addition operations, which are proportional to the number of limbs in a single word. Although this algorithm optimizes the multiplication operation to some extent, it still requires auxiliary commitments that are linearly related to the number of limbs.

The core idea of the GKR (Goldwasser-Kalai-Rothblum) protocol is that the prover (P) and the verifier (V) agree on a layered arithmetic circuit over a finite field F. Each node of the circuit has two inputs, which are used to calculate the required function. In order to reduce the computational complexity of the verifier, the protocol uses the SumCheck protocol to gradually simplify the statements about the circuit output gate values into lower-level gate value statements, until the statements are finally simplified to statements about the inputs. In this way, the verifier only needs to check the correctness of the circuit inputs.

The integer multiplication algorithm based on GKR greatly reduces the commitment overhead by converting "checking whether 2 32-bit integers A and B satisfy A·B =? C" to "checking whether (gA)B =? gC holds", with the help of the GKR protocol. Compared with the previous Binius lookup scheme, the binary field multiplication operation based on GKR only requires one auxiliary commitment, and by reducing the overhead of Sumchecks, the algorithm is more efficient, especially in the scenario where Sumchecks operation is cheaper than commitment generation. With the advancement of Binius optimization, GKR-based multiplication operation has gradually become an effective way to reduce the overhead of binary field polynomial commitment.

3.2 ZeroCheck PIOP Optimization: Prover and Verifier Computational Overhead Tradeoff

The paper "Some Improvements for the PIOP for ZeroCheck" adjusts the distribution of workload between the prover (P) and the verifier (V) and proposes multiple optimization schemes to trade off the overhead. This work explores different k value configurations, so that a cost trade-off is reached between the prover and the verifier, especially in terms of reducing transmitted data and reducing computational complexity.

Reduce the prover’s data transmission: By transferring part of the work to the verifier V, the amount of data sent by the prover P can be reduced. In the i-th round, the prover P needs to send vi+1(X) to the verifier V, where X=0,...,d + 1. The verifier V checks the following equation to verify the correctness of the data

vi = vi+1(0) + vi+1(1).

Optimization method: The prover P can choose not to send vi+1(1), but let the verifier V calculate the value by itself in the following way

vi+1(1) = vi − vi+1(0).

In addition, in round 0, the honest prover P always sends v1(0) = v1(1) = 0, which means that no evaluation calculation is required, significantly reducing the computation and transmission cost to d2n−1CF + (d + 1)2n−1CG.

Reducing the number of prover evaluation points: In round i of the protocol, the verifier has sent a sequence of values r =(r0,...,ri−1) in the previous i rounds. The current protocol requires the prover (P) to send a polynomial vi+1(X) = ∑ δˆn(α,(r,X,x))C(r,X,x).x∈H −−1 Optimization: The prover P sends the following polynomial The relationship between these two functions is: vi(X) = vi′(X)·δi+1((α0,...,αi),(r,X)) where δˆi+1 is completely known to the verifier because it has α and r. The benefit of this modification is that the degree of vi′(X) is 1 less than that of vi(X), which means that the prover needs to evaluate fewer points. Therefore, the main protocol changes occur in the checking between rounds.

In addition, the original constraint vi = vi+1(0)+vi+1(1) is optimized to (1−αi)vi′+1(0)+αivi′+1(1) = vi′(X). The prover then needs to evaluate and send less data, further reducing the amount of data transmitted. Computing δˆn−i−1 is also more efficient than computing δˆn. With these two improvements, the cost is reduced to approximately: 2n−1(d− 1)CF + 2n−1dCG. In the common case of d=3, these optimizations reduce the cost by a factor of 5/3.

Algebraic interpolation optimization: For an honest prover, C(x0,...,xn−1) is zero on Hn, which can be expressed as: C(x0,...,xn-1)= ∑xi(xi-1)Qi(x0,...,xn-1). Although Qi is not unique, an orderly decomposition can be constructed by polynomial long division: starting from Rn=C, Qi and Ri are calculated by dividing by xi(xi−1) successively, where R0 is the multilinear extension of C on Hn and is assumed to be zero. Analyzing the degree of Qi, it can be found that for j>i, the degree of Qj on xi is the same as C; for j = i, the degree is reduced by 2; for j

C(r,X,x) − Qi(r,X,x) = X(X − 1)Qi+1(r,X,x)

Therefore, in each round of the protocol, only a new Q is introduced, whose evaluation value can be calculated from C and the previous Q, achieving interpolation optimization.

3.3 Sumcheck PIOP Optimization: Sumcheck Protocol Based on Small Domain

The STARKs scheme implemented by Binius has a very low commitment overhead, so that the prover bottleneck is no longer PCS, but the sum-check protocol. Ingonyama proposed an improvement scheme for the small domain-based Sumcheck protocol in 2024 (corresponding to the Algo3 and Algo4 algorithms in Figure 2) and open-sourced the implementation code. Algorithm 4 focuses on incorporating Karatsub's algorithm into Algorithm 3 to minimize the number of extended-domain multiplications at the cost of additional base-domain multiplications, so when extended-domain multiplications are much more expensive than base-domain multiplications, Algorithm 4 performs better.

· Impact of switching rounds and improvement factor

The improvement of the Sumcheck protocol based on small domains focuses on the selection of the switching round t. The switching round refers to the time point when switching from the optimized algorithm back to the naive algorithm. The experiments in the paper show that at the optimal switching point, the improvement factor reaches the maximum value and then presents a parabolic trend. If this switching point is exceeded, the performance advantage of the optimized algorithm weakens and the efficiency decreases. This is because base-domain multiplication on a small domain has a higher time ratio than extended-domain multiplication, so it is crucial to switch back to the naive algorithm at the right time.

Figure 2: Relationship between switching rounds and improvement factor

For specific applications, such as Cubic Sumcheck (d = 3), the small domain-based Sumcheck protocol improves on the naive algorithm by an order of magnitude. For example, when the base domain is GF[2], the performance of Algorithm 4 is nearly 30 times higher than that of the naive algorithm.

· Impact of base domain size on performance

The experimental results of the paper show that a smaller base domain (such as GF[2]) can enable the optimization algorithm to show a more significant advantage. This is because the time ratio of the extended domain to the base domain multiplication is higher on a smaller base domain, so the optimization algorithm shows a higher improvement factor under this condition.

· Optimization benefits of Karatsuba algorithm

The Karatsuba algorithm shows significant effects in improving the performance of Sumcheck based on small domains. For the base domain GF[2], the peak improvement factors of Algorithm 3 and Algorithm 4 are 6 and 30 respectively, indicating that Algorithm 4 is five times more efficient than Algorithm 3. Karatsuba optimization not only improves the running efficiency, but also optimizes the switching point of the algorithm, reaching the best at t=5 for Algorithm 3 and t=8 for Algorithm 4 respectively.

· Improvement of memory efficiency

In addition to improving the running time, the Sumcheck protocol based on small domains also shows significant advantages in memory efficiency. The memory requirement of Algorithm 4 is O(d·t), while the memory requirement of Algorithm 3 is O(2d·t). When t=8, Algorithm 4 requires only 0.26MB of memory, while Algorithm 3 requires 67MB to store the product of the base field. This makes Algorithm 4 more adaptable on memory-constrained devices, especially for resource-constrained client proof environments.

3.4 PCS Optimization: FRI-Binius Reduces Binius Proof Size

A major drawback of the Binius protocol is its relatively large proof size, which scales as O(√N) with the square root of the witness size. This square root size proof is a limitation compared to more efficient systems. In contrast, polylogarithmic proof size is critical to achieving a truly "concise" verifier, which is demonstrated in advanced systems like Plonky3, which achieves logarithmic proofs through advanced techniques such as FRI.

The paper "Polylogarithmic Proofs for Multilinears over Binary Towers", referred to as FRI-Binius, implements the binary domain FRI folding mechanism, bringing 4 innovations:

· Flattened polynomials:The initial multilinear polynomials are converted to LCH (Low Code Height) novel polynomial basis.

· Subspace Vanishing Polynomials:Used to perform FRI on the coefficient domain and achieve FFT-like decomposition through additive NTT (number theoretic transform).

· Algebraic Basis Packing:Enables efficient packing of information in the protocol, removing embedding overhead.

· Ring-commutative SumCheck:A novel SumCheck method that optimizes performance using ring-commutative techniques.

The core idea of the multilinear polynomial commitment scheme (PCS) based on binary field FRI-Binius is that the FRI-Binius protocol operates by packing the initial binary field multilinear polynomial (defined on F2) into a multilinear polynomial defined on a larger field K.

In FRI-BiniusPCS based on binary domains, the process is as follows:

· Commitment phase:A commitment to a multilinear polynomial of ℓ variables (defined on F2) is transformed into a commitment to a packed multilinear polynomial of ℓ′ variables (defined on F2128), thus reducing the number of coefficients by a factor of 128.

· Evaluation phase:The prover and verifier perform ℓ′-round cross-ring switching SumCheck and FRI interactive proof (IOPP):

–The FRI open proof occupies the majority of the proof size.

–The prover’s SumCheck cost is similar to the SumCheck cost on regular large domains.

–The prover’s FRI cost is the same as FRI cost on regular large domains.

–The verifier receives 128 elements from F2128 and performs 128 additional multiplications.

With FRI-Binius, Binius proof size is reduced by an order of magnitude. This brings Binius proof size closer to state-of-the-art systems while maintaining compatibility with the binary domain. FRI folding techniques tailored for binary domains, combined with optimizations in algebraic packing and SumCheck, allow Binius to generate more concise proofs while maintaining efficient verification.

Table 2: Binius vs. FRI-Binius Proof Size

Table 3: Plonky3 FRI vs. FRI-Binius

4 Summary

The whole value proposition of Binius is to use the smallest power-of-two domain for witnesses, so just choose the domain size based on what you need. Binius is a co-design using hardware, software, and the Sumcheck protocol accelerated in FPGAs to do fast proofs with very low memory usage. Proof systems like Halo2 and Plonky3 have 4 key steps that take up most of the computation: generating witness data, committing to witness data, vanishing argument, opening proof. Taking Keccak in Plonky3 and Grøstl hash function in Binius as examples, the proportion of the computational amount of the above four key steps is shown in Figure 3:

Figure 3: Smaller commit cost

It can be seen that the commit bottleneck of Prover has been basically completely removed in Binius. The new bottleneck lies in the Sumcheck protocol. The large number of polynomial evaluations and field multiplications in the Sumcheck protocol can be efficiently solved with the help of dedicated hardware. The FRI-Binius scheme, a variant of FRI, provides a very attractive option - eliminating the embedding overhead from the field proof layer without causing a surge in the cost of the aggregate proof layer. Currently, the Irreducible team is developing its recursive layer and announced a collaboration with the Polygon team to build a Binius-based zkVM; JoltzkVM is switching from Lasso to Binius to improve its recursive performance; and the Ingonyama team is implementing an FPGA version of Binius.