[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113504895A - Elliptic curve multi-scalar dot multiplication calculation optimization method and optimization device - Google Patents

Elliptic curve multi-scalar dot multiplication calculation optimization method and optimization device Download PDF

Info

Publication number
CN113504895A
CN113504895A CN202110791569.4A CN202110791569A CN113504895A CN 113504895 A CN113504895 A CN 113504895A CN 202110791569 A CN202110791569 A CN 202110791569A CN 113504895 A CN113504895 A CN 113504895A
Authority
CN
China
Prior art keywords
reduction
elliptic curve
calculation
matrix
bucket
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110791569.4A
Other languages
Chinese (zh)
Other versions
CN113504895B (en
Inventor
高鸣宇
张烨
董江彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhixin Huaxi Information Technology Co ltd
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110791569.4A priority Critical patent/CN113504895B/en
Publication of CN113504895A publication Critical patent/CN113504895A/en
Application granted granted Critical
Publication of CN113504895B publication Critical patent/CN113504895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/724Finite field arithmetic
    • G06F7/725Finite field arithmetic over elliptic curves

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The application discloses an elliptic curve multi-scalar dot product calculation optimization method and an elliptic curve multi-scalar dot product calculation optimization device, intermediate variable points in the main calculation process are cached by designing a barrel matrix, the output of Pippenger intermediate quantities is avoided, the calculation is continuously performed in a pipeline manner until all final calculations are finished, transverse reduction and longitudinal reduction are performed, the total calculation times of serial calculation are reduced from thousands of times to one time, most of expenses such as serial-parallel conversion and synchronous locking are eliminated, the continuous working time of a production line is effectively prolonged, and therefore the overall performance is improved.

Description

Elliptic curve multi-scalar dot multiplication calculation optimization method and optimization device
Technical Field
The application relates to the technical field of cryptography, in particular to an elliptic curve multi-scalar dot multiplication calculation optimization method and an elliptic curve multi-scalar dot multiplication calculation optimization device.
Background
In the related art, as shown in fig. 1, which is a Pippenger algorithm, is an algorithm used in one round of calculation in the MSM module of PipeZK, in which such calculation needs to be performed in total
Figure BDA0003161252870000011
To produce
Figure BDA0003161252870000012
Point, finally this
Figure BDA0003161252870000013
The points are again taken as input for one calculation,thereby obtaining the final result. Where N is the total amount of input data and M is the total amount of input data for a single round, the former typically being 106On the order of magnitude, the latter is often 1000, 1024, etc.
In FIG. 1, for each GiThe storage contents of the buckets (buckets) for temporarily storing the result of one point are different, so that different G's are calculatediEvery time, the Bucket needs to be emptied in advance. And performing longitudinal reduction (Q)j=∑2Gi) And transverse reduction (G)i=∑iBi) This is a serial phase of the algorithm, and there is no parallel algorithm for the time being to perform the parallel computation of this computation, so this serial phase requires the execution of all
Figure BDA0003161252870000014
Next, for the case of millions of input data, this stage is performed thousands of times and cannot be optimized in parallel, which causes a certain performance loss.
Therefore, the disadvantages of the related art are:
1) each round of calculation needs longitudinal reduction, and the calculation process can hardly be parallelized, thereby causing performance loss.
2) Each round of calculation needs to be transversely reduced, and the calculation process can hardly be parallelized, so that the performance is lost.
3) Each round of calculation needs to be written back to the memory, which brings extra logic control and certain difficulty in implementation.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present application is to provide an elliptic curve multi-scalar dot product calculation optimization method, which effectively improves the continuous working time of a pipeline, thereby improving the overall performance.
Another objective of the present application is to provide an elliptic curve multi-scalar point multiplication calculation optimization device.
In order to achieve the above object, an embodiment of an aspect of the present application provides an elliptic curve multi-scalar point multiplication calculation optimization method, including:
in a main calculation process, caching intermediate variable points in the main calculation process by using one row of a bucket matrix;
maintaining the content in the bucket matrix while canceling the transverse reduction at the end of the main calculation process in the PipeZK algorithm, and continuing to the main calculation process of the next order;
canceling part of the calculation process of the longitudinal reduction at the tail of each round;
and after all rounds are finished, performing transverse reduction and longitudinal reduction calculation on the barrel matrix, and outputting elliptic curve points through the barrel matrix.
According to the elliptic curve multi-scalar dot multiplication calculation optimization method, variable points in the main calculation process are cached by designing the bucket matrix, output of the intermediate quantity of the Pippenger is avoided, running and calculation are continuously performed in a pipeline manner until all final calculation is finished, transverse reduction and longitudinal reduction are performed, the total calculation times of serial calculation are reduced from thousands of times to one time, serial-parallel conversion stages between sub-batch processing are optimized, the parallelism degree of the algorithm can be improved by eliminating the conversion, the continuous working time of a production line is effectively prolonged, and therefore the overall performance is improved.
In addition, the elliptic curve multi-scalar dot product calculation optimization method according to the above embodiment of the present application may further have the following additional technical features:
further, in one embodiment of the present application, the bucket matrix is common
Figure BDA0003161252870000021
Row, total 2ζ-1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
Further, in an embodiment of the present application, the performing of the horizontal reduction and the vertical reduction calculation on the bucket matrix includes: and performing transverse reduction on each row of the barrel matrix to obtain a plurality of elliptic curve points, and performing longitudinal reduction on the plurality of elliptic curve points.
Further, in an embodiment of the present application, the performing of the horizontal reduction and the vertical reduction calculation on the bucket matrix includes: and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the plurality of elliptic curve points.
Further, in one embodiment of the present application, in the horizontal reduction calculation, the calculations of different rows are performed in parallel or in series; in the vertical reduction calculation, the calculations of different columns are performed in parallel or in series.
In order to achieve the above object, another embodiment of the present application provides an elliptic curve multi-scalar point multiplication calculation optimization apparatus, including:
the cache module is used for caching the intermediate variable points in the main calculation process by utilizing one row of the bucket matrix in the main calculation process;
the first optimization module is used for keeping the content in the bucket matrix while canceling the transverse reduction at the tail of the main calculation process in the PipeZK algorithm and continuing to the main calculation process of the next order;
the second optimization module is used for canceling part of the calculation process of the longitudinal reduction at the tail of each round;
and the output module is used for performing transverse reduction and longitudinal reduction calculation on the barrel matrix after all rounds are finished, and outputting elliptic curve points through the barrel matrix.
The elliptic curve multi-scalar dot multiplication calculation optimization device provided by the embodiment of the application caches variable points in the main calculation process by designing the bucket matrix, avoids output of the intermediate quantity of the Pippenger, continuously runs and calculates in a pipeline manner until all final calculations are completely finished, then performs transverse reduction and longitudinal reduction, reduces the total calculation times of serial calculation from thousands of times to one time, optimizes the serial-parallel conversion stage between the sub-batch processing, improves the parallelism degree of the algorithm by eliminating the conversion, effectively improves the continuous working time of a production line, and further improves the overall performance.
In addition, the elliptic curve multi-scalar dot product calculation optimization device according to the above embodiment of the present application may further have the following additional technical features:
further, in one embodiment of the present application, the bucket matrix is common
Figure BDA0003161252870000031
Row, total 2ζ-1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
Further, in an embodiment of the present application, the performing of the horizontal reduction and the vertical reduction calculation on the bucket matrix includes: and performing transverse reduction on each row of the barrel matrix to obtain a plurality of elliptic curve points, and performing longitudinal reduction on the plurality of elliptic curve points.
Further, in an embodiment of the present application, the performing of the horizontal reduction and the vertical reduction calculation on the bucket matrix includes: and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the plurality of elliptic curve points.
Further, in one embodiment of the present application, in the horizontal reduction calculation, the calculations of different rows are performed in parallel or in series; in the vertical reduction calculation, the calculations of different columns are performed in parallel or in series.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic diagram of a prior art solution calculation process;
FIG. 2 is a flowchart illustrating an optimization method for multi-scalar dot product calculation of an elliptic curve according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a calculation flow of an elliptic curve multi-scalar point multiplication calculation optimization method according to an embodiment of the present application;
FIG. 4 is a diagram illustrating a hardware architecture and a computing flow according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an access order to coefficient data in various rounds according to an embodiment of the present application;
fig. 6 is a schematic diagram of an access sequence to buckets when a final result is solved after all rounds of calculation are completed according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an elliptic curve multi-scalar dot product calculation optimization device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The definitions and related terms in this application are first introduced.
Bit segment:
the bit segment is a part of a certain number in binary representation, for example, the 2 nd to 4 th bits of the binary number 101011 are a bit segment of the number.
The bit segments are denoted below using the notation:
a[p:q]
where a is a certain number, p is the lowest index of the bit segment, q is the highest index of the bit segment, the indices starting from 0, 0 representing the lowest bit. For example:
a[0:3]
a bit segment consisting of 0 rd bit to 3 rd bit of binary system representing the digit a, the width of the bit segment being 4
And (3) transverse reduction:
the transverse reduction refers to the following calculation process:
Figure BDA0003161252870000041
wherein G is the output of the calculation process, and the type is an elliptic curve point; b isiIs the input to the calculation process, each BiThe types of the points are all elliptic curve points, and the total number is 2ζ-1; ζ is the bit segment width, a constant parameter of the calculation process, often taken as 4.
Longitudinal reduction:
longitudinal reduction refers to the following calculation process:
Figure BDA0003161252870000042
wherein Q is the output of the calculation process, the type being an elliptic curve point; giIs the input to the calculation process, each GiThe types of the points are all elliptic curve points, and the total number is
Figure BDA0003161252870000043
λ is the coefficient field binary bit width, and is input dependent, with common values being 256, 384, 512, etc.; ζ is the bit segment width, a constant parameter of the calculation process, often taken as 4.
And (4) round:
the round is a certain cycle in an outer cycle of the algorithm, specifically, the algorithm comprises two layers of the outer cycle and an inner cycle, the outer cycle needs to read in M elliptic curve points and M coefficients each time, and M is usually 1K. The round refers to a certain time in the process of the loop iteration.
Each round is hereinafter denoted and distinguished by the k-th round, where k is referred to as the round number or round index, starting with 0 and having a maximum value of
Figure BDA0003161252870000051
N is the input size of the whole algorithm,
Figure BDA0003161252870000052
presentation pair
Figure BDA0003161252870000053
The result of (2) is rounded up.
The order:
the bit order refers to a certain cycle in an algorithm inner cycle, specifically, the algorithm comprises an outer cycle and an inner cycle, the inner cycle reads in corresponding bit segments of M coefficients and M elliptic curves with p and q as bit segment parameters each time, namely, M bit segments and M coefficients are read in total, M is usually 1K, and p and q are related to cycle parameters. The bit order refers to a certain time in the loop iteration process.
The individual digits are denoted and distinguished below by the kth digit, where k is referred to as the digit number or digit index, starting from 0 and having a maximum value of
Figure BDA0003161252870000054
The functional relationship between the lowest subscript and the highest subscript of a corresponding bit segment of a certain order and a bit order number exists, the functional relationship between the called order number and the lowest subscript of the bit segment is low-order subscript mapping, the functional relationship between the called order number and the highest subscript of the bit segment is high-order subscript mapping, and the functional relationship between the called order number and a tuple consisting of the lowest subscript and the highest subscript is subscript mapping.
The main calculation process comprises the following steps:
the main calculation process is the main calculation process in the inner loop of the algorithm, which receives a number of elliptic curve points and equal number of bit segments of coefficients, and outputs 2ζ1 elliptic curve point. The main process is to check the coefficient bit segment corresponding to each elliptic curve point and to transfer the elliptic curve point to the barrel with corresponding label by using the value of the bit segment as subscript. When the elliptic curve points are transmitted to the barrel with the corresponding label, if the elliptic curve points do not exist in the barrel, the elliptic curve points are directly filled into the barrel; if the elliptic curve points exist in the barrel, the addition operation result of the elliptic curve points in the barrel and the elliptic curve points of the current elliptic curve points is filled into the barrel. And ending the process until all the elliptic curve points and the coefficients are traversed, and outputting the results in all the buckets.
The calculation process requires specifying the input elliptic curve points, the input coefficients, coefficient bit segment parameters, and bucket positions.
The method mainly aims at optimizing a PipeZK accelerator architecture design on an MSM module. Namely, the prior art implementation scheme is' PipeZK: the other technical solutions are mainly to accelerate by using a distributed system and a GPU, or to multiply by using FPGA but not aiming at multi-scalar points on an elliptic curve. In the original scheme, when the size of input data is N, the data is divided into a plurality of arrays with the same length, the arrays are processed in batches, the length of the arrays and the total length of the arrays are kept equal to each other as much as possible, an elliptic curve point coordinate is generated each time, finally, the generated elliptic curve points are processed again to obtain a final point, and the point is output by an MSM module.
Zero knowledge proves that: the zero-knowledge proof is a very useful cryptographic protocol for protecting privacy, and can be widely used in a plurality of application scenarios such as a block chain. The prover can prove to the verifier that the prover knows a certain knowledge without revealing any information about the knowledge itself through zero knowledge proof.
PipeZK: the PipeZK accelerator is an accelerator for one of zero Knowledge algorithms called Non-interactive concise zero Knowledge proof (zero-Knowledge) algorithm, also commonly abbreviated as zk-SNARK. The scheme capable of accelerating zk-SNARK at present adopts a distributed system, a GPU and an FPGA, the former two situations do not have any overlapping part with the application, and the third scheme adopts the FPGA acceleration, and the operation acceleration on an elliptic curve is mainly performed at present, and the calculation process of multiplying multiple scalar points is not directly accelerated.
MSM (Multi-scalar Multiplication): the point multiplication of elliptic curve is implemented by inputting a set of coefficients and coordinates of points on a set of elliptic curve, outputting an elliptic curve point, calculating a formula by multiplying each coefficient by a corresponding point on the elliptic curve, and multiplying the result by the corresponding point on the elliptic curveEach multiplication result is added. I.e. Q ═ Σ kiPiMiddle kiIs the ith coefficient, PiIs the ith elliptic curve point, the bold represents the elliptic curve point, and the non-bold represents the common number. PipeZK has a POLY module besides the MSM module, and the application is only optimized for MSM.
Elliptic curve: an important basic theory in the field of cryptography, most elliptic curve-based cryptographic algorithms are designed depending on the decomposition difficulty of Q ═ kP (similar well-known RSA algorithms are designed depending on the decomposition difficulty of the product of two large prime numbers). Two operations can be performed on the elliptic curve: two elliptic curve points are added, and a common positive integer is multiplied by one elliptic curve point. The specific calculation formula is not direct addition and multiplication, and a relatively complex solving process is provided, so that the calculation amount of the solving process is relatively large. Therefore, the MSM is often calculated by millions of elliptic curve points, and the calculation cost becomes very large.
The optimization method for elliptic curve multi-scalar point multiplication calculation proposed by the embodiment of the application is described below with reference to the attached drawings.
FIG. 2 is a method for optimizing elliptic curve multi-scalar dot product calculation according to an embodiment of the present application.
As shown in fig. 1, the elliptic curve multi-scalar dot product calculation optimization method includes the following steps:
in step S1, in the main calculation process, intermediate variable points in the main calculation process are buffered by one line of the bucket matrix.
Specifically, the intermediate variable points in different main calculation processes are cached by designing a bucket matrix instead of a group of buckets, so that output of the intermediate quantity of the Pippenger is avoided, running and calculating are continuously performed in a pipeline mode until all final calculations are finished, and then transverse reduction and longitudinal reduction are performed, so that the total calculation times of serial calculation can be reduced from thousands of times to one time, most of expenses of serial-parallel conversion, synchronous locking and the like are eliminated, and the calculation performance of the MSM is improved.
Note that a Bucket (Bucket) is a name for a storage unit of one elliptic curve point. Pippenger is a calculation method for multiple scalar point multiplication on elliptic curves.
Further, in one embodiment of the present application, the bucket matrix is common
Figure BDA0003161252870000061
Row, total 2ζ-1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
Specifically, let λ be the bit width of the coefficient (that is, each coefficient is a λ digit number in binary), and the common values are 256, 384, 512, and so on; zeta is the bit segment width, and the common values are 4 and the like. Designing a bucket matrix, totaling
Figure BDA0003161252870000071
Line, 2ζ-1 column.
Step S2, while canceling the lateral reduction at the end of the primary computation process in the PipeZK algorithm, the contents in the bucket matrix are kept and continued to the primary computation process of the next rank.
And step S3, canceling part of the calculation process of the longitudinal reduction at the end of each round.
And step S4, after all rounds are finished, performing transverse reduction and longitudinal reduction calculation on the barrel matrix, and outputting elliptic curve points through the barrel matrix.
Further, in one embodiment of the present application, performing the horizontal reduction and the vertical reduction calculations on the bucket matrix comprises: and performing transverse reduction on each row of the barrel matrix to obtain a plurality of elliptic curve points, and performing longitudinal reduction on the plurality of elliptic curve points.
Further, in one embodiment of the present application, performing the horizontal reduction and the vertical reduction calculations on the bucket matrix comprises: and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the plurality of elliptic curve points.
Further, in one embodiment of the present application, in the horizontal reduction calculation, the calculations of different rows are performed in parallel or in series; in the vertical reduction calculation, the calculations of different columns are performed in parallel or in series.
Specifically, after all rounds are finished, there are two types of transverse reduction and longitudinal reduction calculations performed on the bucket matrix, and both types of reduction output an elliptic curve point.
Wherein, the first reduction type is to carry out transverse reduction firstly and then carry out longitudinal reduction. On the barrel matrix, each row is transversely reduced to obtain
Figure BDA0003161252870000072
An elliptic curve point. To this again
Figure BDA0003161252870000073
One longitudinal reduction is made for each elliptic curve point. In the process of performing transverse reduction on different rows, the rows are generally executed in parallel, but can also be executed in series, and meanwhile, the execution sequence among the rows has no influence.
The second reduction type is longitudinal reduction first and then transverse reduction. On the bucket matrix, make a longitudinal reduction for each column, get 2ζ1 elliptic curve point, and then 2ζ-1 elliptic curve point for one transverse reduction. In the process of longitudinal reduction of different columns, the columns are generally executed in parallel, but can also be executed in series, and meanwhile, the execution sequence among the columns has no influence.
As shown in fig. 3, a calculation process in the elliptic curve multi-scalar dot product calculation optimization method is shown.
In the main calculation process, the original input needs to be divided into several subsets, and the general division method is to divide the original input into subsets as uniformly as possible
Figure BDA0003161252870000074
Subsets, each subset containing approximately M elliptic curve points and M coefficients. However, the division can be performed disorderly and unevenly without affecting the calculation result, and the number of the divided subsets does not affect the final result. Dividing the subset only requires that the correspondence of each elliptic curve point and its coefficient in the subset is not destroyed, and that each elliptic curve point and each coefficientThe coefficients are used exactly once.
There is an extreme method of bypassing the above-described division feature by changing the data input, i.e., repeating a certain elliptic curve point multiple times or combining the repeated elliptic curve points, and making its corresponding coefficient equal to the original corresponding coefficient.
There are a number of mapping methods for the subscript mapping of the bit segments in the main computation process portion of fig. 3.
If the round number is i and the bit number is j, the index can be divided by the lowest index of the bit segment shown in FIG. 3, which is j ζ and j ζ + ζ -1, respectively, or the index can be divided by the lowest index of the bit segment shown in FIG. 3
Figure BDA0003161252870000081
The lowest position section is a subscript,
Figure BDA0003161252870000082
For the calculation of the highest index of the bit segment, the corresponding selected bucket position should be changed to the first of the selected bucket matrix
Figure BDA0003161252870000083
And (6) rows.
For other bit segment selection methods, the corresponding bucket positions are also changed correspondingly, but the selected core characteristics are as follows: the first, bit segments do not overlap; secondly, the union set of all the selected bit segments is just the whole bit segment of the coefficient; thirdly, the barrel positions are not overlapped; fourth, the union of all selected bucket positions is exactly the entire bucket matrix. The bit segment selection and bucket position selection methods which meet the above four conditions can both obtain correct results.
There are several mapping methods for bucket matrix row and column subscripts: in addition to the above method in which the bit segment and the same position are simultaneously and cooperatively changed, the result may be directly stored in other designated bucket matrix positions during the main calculation process, and the input mode of the horizontal reduction or the vertical reduction is simultaneously adjusted, thereby bypassing the subscript relationship shown in fig. 3.
This method is characterized in that: the first, accessed bucket in the main calculation process, the value of the corresponding access coefficient bit segment, and the multiplied coefficient in the horizontal reduction are certain equal. And in the second and main calculation processes, the accessed bucket corresponding to the lowest index of the accessed bit segment is equal to the logarithm taking the base 2 of the multiplication coefficient in the longitudinal reduction. The barrel position selection method satisfying the above conditions can realize the elliptic curve multi-scalar point multiplication calculation optimization method of the embodiment of the present application.
Specifically, as shown in fig. 4, fig. 4 only illustrates the flow and related components of a round of computation, which is a schematic diagram of the main components and computation flow of the MSM module optimized in the present application. In fig. 4, the coefficient bit width is 256 bits, the bit width of each component of the elliptic curve point coordinates is 768 bits, and N is 1048576, so that the data amount in a single round is 1024. (note that the above parameters are only described in the schematic, and the actual technical solution can be modified to other reasonable values at all).
In fig. 4, "1024 Scalar" at the upper left represents data of a coefficient portion, "1024 Point" at the upper right represents data of an elliptic curve Point portion, a plurality of cylinders at the lower left represent a buffer area, i.e., a Bucket matrix, each cylinder is a Bucket and can store one elliptic curve Point, a PADD portion at the lower right represents an elliptic curve Point addition calculation unit and can perform addition of two elliptic curve points and output one elliptic curve Point, and portions of 3 boxes at the lower right represent FIFO queues for caching all elliptic curve pairs to be calculated by the PADD.
In the scheme before optimization, the buckets have only one row, and the numerical value of a certain section in the coefficient is used for determining which Bucket a certain elliptic curve point is stored in. In the optimized scheme, Bucket has the sharing
Figure BDA0003161252870000084
Line, 2ζ-1 column, λ being the bit width of the coefficient. In principle, the parameter ζ can be changed, the larger the ζ parameter is, the larger the resource overhead is, and selecting ζ equal to 4 as the bit segment width is a compromise choice in resource performance balance.
Taking this round of calculation as an example, there are 1024 coefficients in total, as shown in fig. 5 where idx0 represents the first coefficient and idx1 represents the second coefficient, until idx1023 represents the 1024 th coefficient. Each coefficient bit is 256 bits wide, with 4 bits as a set of slices, for a total of 64 slices. The highest bit of each coefficient is on the left, the lowest bit is on the right, and the highest bit and the lowest bit are respectively marked as a 0 th group, a 1 st group and a 63 rd group from right to left.
The specific calculation flow is that the 63 rd group, that is, the group on the leftmost side, is accessed, idx0, idx1 and idx1023 are accessed from top to bottom, in the process, if the numerical value of the slice is 0, the elliptic curve point corresponding to the coefficient is discarded, if the numerical value is 1, the elliptic curve point corresponding to the coefficient is placed in the first Bucket in the first row, if the numerical value is 2, the elliptic curve point is placed in the second Bucket in the first row, and so on. And when the 63 th group has 1024 coefficients which are accessed completely, returning to the first coefficient, accessing the 62 th group, if the coefficient is 0, discarding the point, if the coefficient is 1, placing the point into the first Bucket in the second row, if the coefficient is 2, placing the point into the second Bucket in the second row, if the coefficient is 3, placing the point into the third Bucket in the second row, and so on.
As shown in fig. 6, the calculation step of longitudinal reduction should be implemented by hardware more efficiently, but can be calculated by software method and still be more efficient than the general calculation method. Specifically, for each column, the calculation is performed from top to bottom, taking the first point of this column, multiplying by 2ζThen add the second point of the column and multiply by 2ζThen add the third point of this column and so on. (after adding the last row of dots, do not multiply by 2ζ) The result of this calculation is written back into the first row of this column, i.e. into row 0. If implemented in hardware, multiply by 2ζShould be broken down into zeta sub-point operations, similar to the software implementation.
And finally, taking out the content in row0 and calculating. The step has two methods of hardware realization and software realization, and both have better effect. Specifically, the first column may be multiplied by 1, the second column may be multiplied by 2, the third column may be multiplied by 3, and so on, and finally the results may be added. The method of multiplying a constant by an elliptic curve point should be decomposed into operations of point multiplication and point addition.
Since the processes of horizontal reduction and vertical reduction in the original scheme are parts which cannot be parallelized and can only be executed in series, the subsequent calculation must wait for the completion of the calculation process, and most parts of hardware are in an idle state during the serial calculation. The serial computing overhead is eliminated, the number of times is reduced from thousands of times to only one time, the hardware is almost continuously in a working state, and the end of the pre-computation does not need to be waited, so that the overall acceleration performance is improved.
As another possible implementation, the same calculation result may be obtained by eliminating only the horizontal reduction in the loop or only the vertical reduction in the loop, and the corresponding bucket matrix and the associated storage structure may be changed accordingly. This acceleration effect is less efficient than the bucket matrix acceleration effect, but still higher than the PipeZK performance.
According to the elliptic curve multi-scalar dot multiplication calculation optimization method provided by the embodiment of the application, the intermediate variable points in the main calculation process are cached by designing the bucket matrix, the output of the Pippenger intermediate quantity is avoided, the continuous running and calculation are carried out until all final calculations are finished, then the transverse reduction and the longitudinal reduction are carried out, the total calculation times of serial calculation are reduced from thousands of times to one time, the serial-parallel conversion stage between the sub-batch processing is optimized, the parallelism degree of the algorithm can be improved by eliminating the conversion, the continuous working time of a production line is effectively prolonged, and the overall performance is improved.
Next, an elliptic curve multi-scalar point multiplication calculation optimization apparatus proposed according to an embodiment of the present invention is described with reference to the drawings.
FIG. 7 is a schematic structural diagram of an apparatus for optimizing multi-scalar dot product calculation of an elliptic curve according to an embodiment of the present invention.
As shown in fig. 7, the elliptic curve multi-scalar dot product calculation optimizing device includes: a cache module 100, a first optimization module 200, a second optimization module 300, and an output module 400.
The buffer module 100 is configured to buffer intermediate variable points in the main calculation process by using one row of the bucket matrix in the main calculation process.
The first optimization module 200 is configured to maintain the contents of the bucket matrix while canceling the horizontal reduction at the end of the main computation process in the PipeZK algorithm, and continue to the main computation process of the next rank.
And the second optimization module 300 is used for canceling part of the calculation process of the longitudinal reduction at the end of each round.
And the output module 400 is configured to perform horizontal reduction and vertical reduction calculation on the bucket matrix after all rounds are finished, and output elliptic curve points through the bucket matrix.
Further, in one embodiment of the present application, the bucket matrix is common
Figure BDA0003161252870000101
Row, total 2ζ-1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
Further, in one embodiment of the present application, performing the horizontal reduction and the vertical reduction calculations on the bucket matrix comprises: and performing transverse reduction on each row of the barrel matrix to obtain a plurality of elliptic curve points, and performing longitudinal reduction on the plurality of elliptic curve points.
Further, in one embodiment of the present application, performing the horizontal reduction and the vertical reduction calculations on the bucket matrix comprises: and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the plurality of elliptic curve points.
Further, in one embodiment of the present application, in the horizontal reduction calculation, the calculations of different rows are performed in parallel or in series; in the vertical reduction calculation, the calculations of different columns are performed in parallel or in series.
It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and is not repeated herein.
According to the elliptic curve multi-scalar dot multiplication calculation optimization device provided by the embodiment of the application, the intermediate variable points in the main calculation process are cached by designing the bucket matrix, the output of the Pippenger intermediate quantity is avoided, the continuous running and calculation are carried out until all final calculations are finished, then the transverse reduction and the longitudinal reduction are carried out, the total calculation times of serial calculation are reduced from thousands of times to one time, the serial-parallel conversion stage between the sub-batch processing is optimized, the parallelism degree of the algorithm can be improved by eliminating the conversion, the continuous working time of a production line is effectively prolonged, and the overall performance is improved.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. An elliptic curve multi-scalar dot product calculation optimization method is characterized by comprising the following steps:
in a main calculation process, caching intermediate variable points in the main calculation process by using one row of a bucket matrix;
maintaining the content in the bucket matrix while canceling the transverse reduction at the end of the main calculation process in the PipeZK algorithm, and continuing to the main calculation process of the next order;
canceling part of the calculation process of the longitudinal reduction at the tail of each round;
and after all rounds are finished, performing transverse reduction and longitudinal reduction calculation on the barrel matrix, and outputting elliptic curve points through the barrel matrix.
2. The method of claim 1, wherein the bucket matrix is common
Figure FDA0003161252860000011
Row, total 2ζ-1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
3. The method of claim 1, wherein performing the horizontal reduction and vertical reduction calculations on the bucket matrix comprises:
and performing transverse reduction on each row of the barrel matrix to obtain a plurality of elliptic curve points, and performing longitudinal reduction on the plurality of elliptic curve points.
4. The method of claim 1, wherein performing the horizontal reduction and vertical reduction calculations on the bucket matrix comprises:
and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the plurality of elliptic curve points.
5. The method according to any one of claims 1 to 4,
in the horizontal reduction calculation, the calculation of different rows is executed in parallel or in series;
in the vertical reduction calculation, the calculations of different columns are performed in parallel or in series.
6. An elliptic curve multi-scalar dot product calculation optimization device, comprising:
the cache module is used for caching the intermediate variable points in the main calculation process by utilizing one row of the bucket matrix in the main calculation process;
the first optimization module is used for keeping the content in the bucket matrix while canceling the transverse reduction at the tail of the main calculation process in the PipeZK algorithm and continuing to the main calculation process of the next order;
the second optimization module is used for canceling part of the calculation process of the longitudinal reduction at the tail of each round;
and the output module is used for performing transverse reduction and longitudinal reduction calculation on the barrel matrix after all rounds are finished, and outputting elliptic curve points through the barrel matrix.
7. The apparatus of claim 6, wherein the bucket matrix is common
Figure FDA0003161252860000012
Row, total 2ζ-1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
8. The apparatus of claim 6, wherein the performing lateral reduction and longitudinal reduction calculations on the bucket matrix comprises:
and performing transverse reduction on each row of the barrel matrix to obtain a plurality of elliptic curve points, and performing longitudinal reduction on the plurality of elliptic curve points.
9. The apparatus of claim 6, wherein the performing lateral reduction and longitudinal reduction calculations on the bucket matrix comprises:
and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the plurality of elliptic curve points.
10. The apparatus according to any one of claims 6 to 9,
in the horizontal reduction calculation, the calculation of different rows is executed in parallel or in series;
in the vertical reduction calculation, the calculations of different columns are performed in parallel or in series.
CN202110791569.4A 2021-07-13 2021-07-13 Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device Active CN113504895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110791569.4A CN113504895B (en) 2021-07-13 2021-07-13 Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110791569.4A CN113504895B (en) 2021-07-13 2021-07-13 Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device

Publications (2)

Publication Number Publication Date
CN113504895A true CN113504895A (en) 2021-10-15
CN113504895B CN113504895B (en) 2024-02-20

Family

ID=78013250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110791569.4A Active CN113504895B (en) 2021-07-13 2021-07-13 Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device

Country Status (1)

Country Link
CN (1) CN113504895B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114879934A (en) * 2021-12-14 2022-08-09 中国科学院深圳先进技术研究院 Efficient zero-knowledge proof accelerator and method
WO2024145835A1 (en) * 2023-01-04 2024-07-11 声龙(新加坡)私人有限公司 Zero-knowledge proof verification method and device, terminal, and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030206629A1 (en) * 2002-05-01 2003-11-06 Sun Microsystems, Inc. Hardware accelerator for elliptic curve cryptography
JP2004177582A (en) * 2002-11-26 2004-06-24 Fujitsu Ltd Elliptic curve ciphering system, and elliptic curve ciphering operation method
DE102005028662A1 (en) * 2005-03-04 2006-09-07 IHP GmbH - Innovations for High Performance Microelectronics/Institut für innovative Mikroelektronik Polynom multiplication calculating method e.g. for elliptical curve cryptography, making available coefficients with two polynomials each polynomial fragmented into two or more fragments, being operands partial multiplication
CA2542556A1 (en) * 2005-06-03 2006-12-03 Tata Consultancy Services Limited An authentication system executing an elliptic curve digital signature cryptographic process
CN101483517A (en) * 2007-12-28 2009-07-15 英特尔公司 A technique for aacelerating characteristic 2 eeliptic curve cryptography
CN109379191A (en) * 2018-09-07 2019-02-22 阿里巴巴集团控股有限公司 A kind of point multiplication operation circuit and method based on elliptic curve basic point
CN111373694A (en) * 2020-02-21 2020-07-03 香港应用科技研究院有限公司 Zero-knowledge proof hardware accelerator and method thereof

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030206629A1 (en) * 2002-05-01 2003-11-06 Sun Microsystems, Inc. Hardware accelerator for elliptic curve cryptography
US20030212729A1 (en) * 2002-05-01 2003-11-13 Sun Microsystems, Inc. Modular multiplier
JP2004177582A (en) * 2002-11-26 2004-06-24 Fujitsu Ltd Elliptic curve ciphering system, and elliptic curve ciphering operation method
DE102005028662A1 (en) * 2005-03-04 2006-09-07 IHP GmbH - Innovations for High Performance Microelectronics/Institut für innovative Mikroelektronik Polynom multiplication calculating method e.g. for elliptical curve cryptography, making available coefficients with two polynomials each polynomial fragmented into two or more fragments, being operands partial multiplication
CA2542556A1 (en) * 2005-06-03 2006-12-03 Tata Consultancy Services Limited An authentication system executing an elliptic curve digital signature cryptographic process
CN101483517A (en) * 2007-12-28 2009-07-15 英特尔公司 A technique for aacelerating characteristic 2 eeliptic curve cryptography
CN109379191A (en) * 2018-09-07 2019-02-22 阿里巴巴集团控股有限公司 A kind of point multiplication operation circuit and method based on elliptic curve basic point
CN111373694A (en) * 2020-02-21 2020-07-03 香港应用科技研究院有限公司 Zero-knowledge proof hardware accelerator and method thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LRAJ FATHIRAD: "Marlin: Preprocessing zkSNARKs with Universal and Updatable SRS", 《ANNUAL INTERNATIONAL CONFERENCE ON THE THEORY AND APPLICATIONS OF CRYPTOGRAPHIC TECHNIQUES》, 1 May 2020 (2020-05-01), pages 738 *
YE ZHANG, ETC: "PipeZK: Accelerating Zero-Knowledge Proof with a Pipelined Architecture", 《2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA)》, 18 June 2021 (2021-06-18), pages 1 - 12 *
于伟: "椭圆曲线密码学若干算法研究", 《中国博士学位论文全文数据库信息科技辑》, 15 May 2014 (2014-05-15), pages 136 - 23 *
陈妍;宋晶晶;杨习贝;: "约简加速求解的属性簇方法", 南京理工大学学报, no. 02, 15 May 2020 (2020-05-15), pages 216 - 223 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114879934A (en) * 2021-12-14 2022-08-09 中国科学院深圳先进技术研究院 Efficient zero-knowledge proof accelerator and method
CN114879934B (en) * 2021-12-14 2023-01-10 中国科学院深圳先进技术研究院 Efficient zero-knowledge proof accelerator and method
WO2024145835A1 (en) * 2023-01-04 2024-07-11 声龙(新加坡)私人有限公司 Zero-knowledge proof verification method and device, terminal, and storage medium

Also Published As

Publication number Publication date
CN113504895B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN113504895A (en) Elliptic curve multi-scalar dot multiplication calculation optimization method and optimization device
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN111966324B (en) Implementation method and device for multi-elliptic curve scalar multiplier and storage medium
US8880575B2 (en) Fast fourier transform using a small capacity memory
Seo et al. Multi-precision multiplication for public-key cryptography on embedded microprocessors
CN112464296B (en) Large integer multiplier hardware circuit for homomorphic encryption technology
CN114936350B (en) Full-homomorphic encryption gate bootstrap method based on GPU (graphic processing unit) rapid number theory conversion
CN112949845B (en) Deep convolutional neural network accelerator based on FPGA
US20190130274A1 (en) Apparatus and methods for backward propagation in neural networks supporting discrete data
CN114330656A (en) Convolution operation hardware accelerator and data processing method
JP5486226B2 (en) Apparatus and method for calculating DFT of various sizes according to PFA algorithm using Ruritanian mapping
CN109711542A (en) A kind of DNN accelerator that supporting dynamic accuracy and its implementation
CN111221501B (en) Number theory conversion circuit for large number multiplication
CN116561819A (en) Encryption and decryption method based on from-Cook on-loop polynomial multiplication and on-loop polynomial multiplier
CN109379191B (en) Dot multiplication operation circuit and method based on elliptic curve base point
CN116954559A (en) Multi-scalar multiplier and acceleration method
CN117633418A (en) Multi-dimensional fast Fourier transformation acceleration method based on matrix operation
CN116090518A (en) Feature map processing method and device based on systolic operation array and storage medium
CN115481364A (en) Parallel computing method for large-scale elliptic curve multi-scalar multiplication based on GPU (graphics processing Unit) acceleration
US20220004363A1 (en) Semiconductor device, data generation methods used for the same, and method of controlling the same
US11995554B2 (en) Apparatus and methods for backward propagation in neural networks supporting discrete data
CN113780529A (en) FPGA-oriented sparse convolution neural network multi-level storage computing system
Reeves A Parallel Implementation of Buchberger's Algorithm overZpforp≤ 31991
Lenstra Massively parallel computing and factoring
CN118364872B (en) Point cloud target detection network hardware acceleration method based on column reconstruction convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220126

Address after: 518048 1410, building 1, Changfu Jinmao building, Fubao community, Fubao street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Zhixin Huaxi Information Technology Co.,Ltd.

Address before: 100084 Tsinghua Yuan, Beijing, Haidian District

Applicant before: TSINGHUA University

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 518048 1410, building 1, Changfu Jinmao building, south of Shihua Road, Fubao community, Fubao street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Zhixin Huaxi Information Technology Co.,Ltd.

Address before: 518048 1410, building 1, Changfu Jinmao building, Fubao community, Fubao street, Futian District, Shenzhen, Guangdong Province

Applicant before: Shenzhen Zhixin Huaxi Information Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant