Replies: 5 comments 2 replies
-
Currently Flint is not used for multivariate polynomials at all. The only difference is that if the coeficients of the polynomial are ZZ, QQ or |
Beta Was this translation helpful? Give feedback.
-
The way that you are presenting this makes it a bit difficult to see clearly what is going on. Firstly when using Flint the ground types are Flint which means that ZZ and QQ are Flint's types rather than gmpy2's types. However these both use GMP under the hood so there should not really be any noticeable difference for this unless they are somehow being built differently or gmpy2's wrapper code is slightly better micro-optimised than python-flint's. In that sense any speedup that results from using gmpy2 should be seen equally when using Flint. If there is a slowdown when using Flint compared to gmpy2 then that is most likely something in the SymPy code. That would be worth investigating but it's a bit difficult to tell from your scripts what specifically it is that is slower. Can you show a single example of a simple operation that is slower under Flint than gmpy2? If you want to time the additional speedup that Flint brings over gmpy2 then currently that should only be seen for univariate polynomials with coefficients in ZZ, QQ or GF(p) (or other operations that use univariate polynomials internally). Work on multivariate polynomials is ongoing. |
Beta Was this translation helpful? Give feedback.
-
Here is a simpler demonstration of the case where there is a slowdown with Flint: In [1]: p = Poly(sum(x**i*y**j*z**k for i in range(3) for j in range(3) for k in range(3)))
In [2]: print(p.as_expr())
x**2*y**2*z**2 + x**2*y**2*z + x**2*y**2 + x**2*y*z**2 + x**2*y*z + x**2*y + x**2*z**2 + x**2*z + x**2 + x*y**2*z**2 + x*y**2*z + x*y**2 + x*y*z**2 + x*y*z + x*y + x*z**2 + x*z + x + y**2*z**2 + y**2*z + y**2 + y*z**2 + y*z + y + z**2 + z + 1
In [3]: %time ok = p ** 20
CPU times: user 23.5 s, sys: 3.72 ms, total: 23.5 s
Wall time: 23.5 s With gmpy2 I get: In [3]: %time ok = p ** 20
CPU times: user 17.4 s, sys: 11.6 ms, total: 17.4 s
Wall time: 17.4 s So for this example Flint is about 30% slower. We are not in this case actually using Flint's polynomials. The difference here should only be the arithemetic with ZZ. It is possible that python-flint is a bit slower for the arithmetic and that it is possible to optimise something in python-flint. As I say both use GMP so speed of large integer arithmetic should not be relevant. The integers involved are also not super large but their size is a bit suspicious:
This is slightly larger than In [6]: 2**62
Out[6]: 4611686018427387904 It is worth noting that in Flint a smaller integer than this will be represented inline using a 64 bit integer whereas large integers are represented with a pointer to a GMP mpz. This is an important optimisation for dealing with small integers in C in Flint's internal code. It is possible that it causes a bit of overhead when dealing with integers that are approximately the size of the threshold because it involves some conversions. Alternatively this is also not far off In [1]: p = Poly(sum(1000000*x**i*y**j*z**k for i in range(3) for j in range(3) for k in range(3)))
In [2]: %time ok = p ** 15 # python-flint
CPU times: user 15.9 s, sys: 5.14 ms, total: 15.9 s
Wall time: 15.9 s
In [3]: max(ok.coeffs())
Out[3]:
5712367503427737543000000000000000000000000000000000000000000000000000000000000000000000000000000000000 ↪
↪ 000000 With gmpy2: In [2]: %time ok = p ** 15 # gmpy2
CPU times: user 7.63 s, sys: 3.21 ms, total: 7.64 s
Wall time: 7.63 s That factor of 2x does not hold up when going to much larger integers e.g.: In [4]: p = Poly(sum(10**100*x**i*y**j*z**k for i in range(3) for j in range(3) for k in range(3)))
In [5]: %time ok = p ** 15 # python-flint
CPU times: user 47 s, sys: 20.3 ms, total: 47 s
Wall time: 47 s Vs gmpy2: In [1]: p = Poly(sum(10**100*x**i*y**j*z**k for i in range(3) for j in range(3) for k in range(3)))
In [2]: %time ok = p ** 15 # gmpy2
CPU times: user 35.8 s, sys: 36 ms, total: 35.9 s
Wall time: 35.9 s At this size python-flint is about 20% slower. More detailed measurements would be needed. It is possible that something can be improved in python-flint and/or in SymPy's wrapper code. It is also possible that it is just to do with how GMP is being built for the wheels in gmpy2 vs python-flint. Note that once SymPy starts using Flint's multivariate polynomials it will be a lot faster than the indicated measurements. The latest prerelease of Flint (0.7.0a4) has preliminary multivariate polynomial support:
For example: In [27]: import flint
In [28]: p = Poly(sum(x**i*y**j*z**k for i in range(3) for j in range(3) for k in range(3)))
In [29]: d = dict(p.as_poly(t).rep.to_list()[0])
In [30]: ctx = flint.fmpz_mpoly_ctx(3, flint.Ordering.lex, ["x", "y", "z"])
In [31]: p_flint = ctx.from_dict(d)
In [32]: p.as_expr()
Out[32]:
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ↪
x ⋅y ⋅z + x ⋅y ⋅z + x ⋅y + x ⋅y⋅z + x ⋅y⋅z + x ⋅y + x ⋅z + x ⋅z + x + x⋅y ⋅z + x⋅y ⋅z + x⋅y + x⋅ ↪
↪ 2 2 2 2 2 2 2 2
↪ y⋅z + x⋅y⋅z + x⋅y + x⋅z + x⋅z + x + y ⋅z + y ⋅z + y + y⋅z + y⋅z + y + z + z + 1
In [33]: p_flint
Out[33]: x^2*y^2*z^2 + x^2*y^2*z + x^2*y^2 + x^2*y*z^2 + x^2*y*z + x^2*y + x^2*z^2 + x^2*z + x^2 + x*y^2*z^2 + x*y^2*z + x*y^2 + x*y*z^2 + x*y*z + x*y + x*z^2 + x*z + x + y^2*z^2 + y^2*z + y^2 + y*z^2 + y*z + y + z^2 + z + 1
In [34]: %time ok = p ** 10
CPU times: user 712 ms, sys: 0 ns, total: 712 ms
Wall time: 711 ms
In [35]: %time ok = p_flint ** 10
CPU times: user 32.8 ms, sys: 0 ns, total: 32.8 ms
Wall time: 32.9 ms So for this benchmark using Flint's multivariate polynomials is 20x faster than the SymPy's DMP or sparse representations: In [40]: p_sparse = p.as_poly(t).rep.to_list()[0]
In [41]: %time ok = p_sparse ** 10
CPU times: user 851 ms, sys: 0 ns, total: 851 ms
Wall time: 850 ms I expect that the next release of python-flint will have multivariate polynomial support and that the next release of SymPy will be able to make use of it internally at least for Poly if not also PolyElement. |
Beta Was this translation helpful? Give feedback.
-
So this case is 2x slower for python-flint than gmpy2 which is the biggest difference I have found: In [1]: p = Poly(sum(1000000*x**i*y**j*z**k for i in range(3) for j in range(3) for k in range(3)))
In [2]: %time ok = p ** 15
CPU times: user 18.3 s, sys: 9.86 ms, total: 18.3 s
Wall time: 18.3 s Under gmpy2 the profiler shows: In [4]: %prun -s cumulative p ** 15
...
7552146 function calls (7470827 primitive calls) in 12.435 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 12.435 12.435 {built-in method builtins.exec}
1 0.003 0.003 12.435 12.435 <string>:1(<module>)
1 0.000 0.000 12.432 12.432 decorators.py:58(__sympifyit_wrapper)
1 0.000 0.000 12.432 12.432 polytools.py:4316(__pow__)
1 0.000 0.000 12.432 12.432 polytools.py:1548(pow)
1 0.000 0.000 12.432 12.432 polyclasses.py:515(pow)
1 0.001 0.001 12.432 12.432 polyclasses.py:1393(_pow)
1 0.001 0.001 12.431 12.431 densearith.py:960(dmp_pow)
72965/436 0.220 0.000 12.403 0.028 densearith.py:792(dmp_mul)
72576 7.653 0.000 11.486 0.000 densearith.py:735(dup_mul)
2216151 1.510 0.000 1.510 0.000 {built-in method builtins.max}
2143827 1.169 0.000 1.169 0.000 {built-in method builtins.min}
2143831 1.065 0.000 1.065 0.000 {method 'append' of 'list' objects}
81403/73148 0.052 0.000 0.668 0.000 densearith.py:548(dmp_add)
81001 0.514 0.000 0.613 0.000 densearith.py:513(dup_add)
199/3 0.005 0.000 0.359 0.120 densearith.py:875(dmp_sqr)
286862 0.104 0.000 0.166 0.000 densebasic.py:137(dup_degree)
291670 0.062 0.000 0.062 0.000 {built-in method builtins.len}
1 0.000 0.000 0.050 0.050 decorator.py:229(fun)
143853 0.035 0.000 0.035 0.000 densebasic.py:255(dup_strip)
422 0.021 0.000 0.034 0.000 densearith.py:835(dup_sqr)
1 0.000 0.000 0.022 0.022 history.py:55(only_when_enabled)
9981 0.008 0.000 0.008 0.000 densebasic.py:723(dmp_zero)
724 0.002 0.000 0.003 0.000 densebasic.py:282(dmp_strip)
684/345 0.001 0.000 0.003 0.000 densearith.py:275(dmp_mul_ground)
1506 0.001 0.000 0.003 0.000 densebasic.py:160(dmp_degree)
653 0.002 0.000 0.002 0.000 densearith.py:255(dup_mul_ground)
2955 0.001 0.000 0.002 0.000 densebasic.py:698(dmp_zero_p)
345 0.001 0.000 0.001 0.000 domain.py:379(__call__) Under python-flint we have In [4]: %prun -s cumulative p ** 15
7552144 function calls (7470825 primitive calls) in 22.188 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 22.188 22.188 {built-in method builtins.exec}
1 0.005 0.005 22.188 22.188 <string>:1(<module>)
1 0.000 0.000 22.183 22.183 decorators.py:58(__sympifyit_wrapper)
1 0.000 0.000 22.183 22.183 polytools.py:4316(__pow__)
1 0.000 0.000 22.182 22.182 polytools.py:1548(pow)
1 0.000 0.000 22.182 22.182 polyclasses.py:515(pow)
1 0.001 0.001 22.182 22.182 polyclasses.py:1393(_pow)
1 0.001 0.001 22.182 22.182 densearith.py:960(dmp_pow)
72965/436 0.708 0.000 22.148 0.051 densearith.py:792(dmp_mul)
72576 16.198 0.000 20.547 0.000 densearith.py:735(dup_mul)
2216151 1.767 0.000 1.767 0.000 {built-in method builtins.max}
2143827 1.344 0.000 1.344 0.000 {built-in method builtins.min}
2143831 1.136 0.000 1.136 0.000 {method 'append' of 'list' objects}
81403/73148 0.052 0.000 0.858 0.000 densearith.py:548(dmp_add)
81001 0.698 0.000 0.804 0.000 densearith.py:513(dup_add)
199/3 0.008 0.000 0.521 0.174 densearith.py:875(dmp_sqr)
286862 0.108 0.000 0.173 0.000 densebasic.py:137(dup_degree)
1 0.000 0.000 0.067 0.067 decorator.py:229(fun)
291670 0.066 0.000 0.066 0.000 {built-in method builtins.len}
422 0.031 0.000 0.044 0.000 densearith.py:835(dup_sqr)
143853 0.044 0.000 0.044 0.000 densebasic.py:255(dup_strip)
1 0.000 0.000 0.024 0.024 history.py:55(only_when_enabled)
9981 0.009 0.000 0.009 0.000 densebasic.py:723(dmp_zero)
724 0.003 0.000 0.004 0.000 densebasic.py:282(dmp_strip)
684/345 0.001 0.000 0.003 0.000 densearith.py:275(dmp_mul_ground)
1506 0.001 0.000 0.003 0.000 densebasic.py:160(dmp_degree)
1 0.000 0.000 0.002 0.002 history.py:845(writeout_cache)
653 0.002 0.000 0.002 0.000 densearith.py:255(dup_mul_ground)
2955 0.001 0.000 0.002 0.000 densebasic.py:698(dmp_zero_p)
345 0.001 0.000 0.001 0.000 domain.py:379(__call__)
2 0.000 0.000 0.001 0.000 traitlets.py:708(__set__)
2 0.000 0.000 0.001 0.000 traitlets.py:3631(set) It looks as if all of the time is spent here in sympy/sympy/polys/densearith.py Lines 760 to 769 in ffcbd9e I gues gmpy2 is a little better optimised somehow for the coeff += f[j]*g[i - j] but this is just a question of __add__ and __mul__ for flint.fmpz vs gmpy2.mpz.
Here are timings for python-flint vs gmpy2 with integers just smaller than In [15]: import flint
In [16]: import gmpy2
In [17]: f = flint.fmpz(2**60)
In [18]: g = gmpy2.mpz(2**60)
In [19]: %timeit f + f
133 ns ± 0.823 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [20]: %timeit g + g
71.2 ns ± 0.347 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [21]: %timeit f * f
236 ns ± 1.71 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [22]: %timeit g * g
74.2 ns ± 1.04 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) The timings come out the same with In [30]: f = flint.fmpz(2**129)
In [31]: g = gmpy2.mpz(2**129)
In [32]: %timeit f + f
254 ns ± 2.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [33]: %timeit g + g
76.7 ns ± 3.29 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [34]: %timeit f * f
258 ns ± 3.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [35]: %timeit g * g
86.5 ns ± 0.2 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) I think there are two things here. One is a little overhead in the python-flint wrapper. The other is that maybe GMP and Flint etc could be being built differently (different compiler optimisation, different hardware support etc). I am sure that more effort has gone into micro-optimising gmpy2 for the small integer case. In my local build of python-flint I get faster timings. So I guess it is likely that we lose some speed when packaging portable binaries for PyPI. This is what I get with local builds: In [7]: f = flint.fmpz(2**60)
In [8]: g = gmpy2.mpz(2**60)
In [9]: %timeit f + f
78.6 ns ± 0.755 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [10]: %timeit g + g
56.6 ns ± 0.298 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) So python-flint still measures a little slower here but the difference is smaller. This diff in python-flint makes it about 10% faster: diff --git a/src/flint/types/fmpz.pyx b/src/flint/types/fmpz.pyx
index 0b16375..0a025b8 100644
--- a/src/flint/types/fmpz.pyx
+++ b/src/flint/types/fmpz.pyx
@@ -1,3 +1,5 @@
+from cpython.object cimport PyTypeObject, PyObject_TypeCheck
+
from flint.flint_base.flint_base cimport flint_scalar
from flint.utils.typecheck cimport typecheck
from flint.utils.conversion cimport chars_from_str
@@ -185,6 +187,10 @@ cdef class fmpz(flint_scalar):
return -self
def __add__(s, t):
+ if PyObject_TypeCheck(t, <PyTypeObject*>fmpz):
+ u = fmpz.__new__(fmpz)
+ fmpz_add((<fmpz>u).val, (<fmpz>s).val, (<fmpz>t).val)
+ return u
cdef fmpz_struct tval[1]
cdef int ttype = FMPZ_UNKNOWN
u = NotImplemented It is probably possible to go a lot further in micro-optimising this. It looks like gmpy2 uses the C API directly rather than Cython. It might be that there is a limit to how far we can go in Cython: In any case I think I have satisfied myself that the cases you see where python-flint is a bit slower than gmpy2 are just about micro-optimisation and not some major problem. For large enough integers there is no difference. For smaller integers it looks like gmpy2 is better micro-optimised. We can spend more time working on that for python-flint as well but a higher priority is getting higher-level things like the multivariate polynomials working because that is where the big speedups will come from. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the in-depth replies :) |
Beta Was this translation helpful? Give feedback.
-
@oscarbenjamin I have a question for you.
I'm trying to understand the performance implications of using python-flint vs bare bone sympy, or gmpy2, specifically when working with polynomials.
I'm going to show the results of a couple of benchmarks.
My setup: Ubuntu 24.04, Python 3.12.3, sympy 1.13.1, gmpy2 2.2.1, python-flint 0.6.0
First Benchmark
I'm constructing a multivariate polynomial of total degree 3, containing all monomials. The benchmark consists in exponentiating the polynomial with an increasing exponent. The higher the exponent, the "bigger" the resulting polynomial. In order to reduce the effects of multiplications, all coefficients of the original polynomial are going to be 1. I'm going to compare the two representations of polynomials: DMP and sparse. Being the polynomial dense (containing all monomials) I already know that sparse representation is the wrong choice for it, but I'm still interested in the results...
The following code block needs to be executed three times, each one with a new kernel. For each run, set the appropriate value for
SYMPY_GROUND_TYPES
. At the end of the run, store the resulting times. (Each run takes a few minutes on my machine).Results:
Clearly,
python-flint
introduces a considerable overhead as the polynomial gets bigger and bigger. Is this behavior to be expected?Second benchmark - large coefficients
This tests exponentiates a multivariate polynomial with total degree 1, having non-unitary integer coefficients. As the exponent get bigger, so do the coefficients.
The following code block needs to be executed three times, each one with a new kernel. For each run, set the appropriate value for
SYMPY_GROUND_TYPES
. At the end of the run, store the resulting times. (Each run takes a few minutes on my machine).Results:
This results is inline with my expectations: the bigger the coefficients, the bigger the speedups offered by gmpy2 and python-flint. I executed this benchmark a few times on my machine, and python-flint always comes out slower than gmpy2.
The question is: when does it make sense to choose python-flint in favor of gmpy2?
Beta Was this translation helpful? Give feedback.
All reactions