Ok, this made us excited, so we dugg deeper and ran more benchmarks. But first, let me say a few general remarks. I want a fast CAS (Computer Algebra System) in Python. General CAS, that people use, that is useful, that is easily extensible (!), that is not missing anything, that is comparable to Mathematica and Maple -- and most importantly -- I want it now and I don't care about 30 years horizons (I won't be able to do any serious programming in 30 years anyway). All right. How to do that? Well, many people tried... And failed. The only opensource CAS system, that has any chance of becoming the opensource CAS, in my own opinion, is Sage. You can read more about my impressions form Sage here. I am actually only interested in mathematical physics, so basically Sage.calculus. Currently Sage uses Maxima, because Maxima is old, proven, working system and it's reasonably fast and quite reliable, but written in LISP. Some people like LISP. I don't and I find it extremely difficult to extend Maxima. Also even though Maxima is in LISP, it uses it's own language for interacting with the user (well, that's not the way). I like python, so I want to use Python. Sage has written Python wrappers to Maxima, so Sage can do almost everything that Maxima can, plus many other things. Now. But the Sage.calculus has issues.
First, I don't know how to extend the wrappers with some new things, see my post in the sage-devel for details, it's almost 2 months old with no reaction, which shows that it's a difficult issue (or nonsense:)).
And second, it's slow. For some examples that Sage users have found out, even SymPy, as it is now, is 7x faster than Sage and sympycore 23x faster and with the recent speed improvements 40x faster than Sage.
So let's improve Sage.calculus. How? Well, no one knows for sure, but
I believe in my original idea of pure Python CAS (SymPy), possibly with some parts rewritten in C. Fortunately, quite a lot of us believe that this is the way.
What is this sympycore thing? In sympy, we wanted to have something now, instead of tomorrow, so we were adding a lot of features, not looking too much on speed. But then Pearu Peterson came and said, guys, we need speed too. So he rewrote the core (resulting in 10x to 100x speedup) and we moved to the new core. But first, the speed isn't sufficient, and second it destabilized SymPy a lot (there are still some problems with caching and assumptions half a year later). So with the next package of speed improvements, we decided to either port them to the current sympy, or wait until the new core stabilizes enough. So the new new core is called sympycore now, currently it only has the very basic arithmetics (and derivatives and simple integrals), but it's very fast. It's mainly done by Pearu. But for example the latest speed improvement using sexpressions was invented by Fredrik Johansson, another SymPy developer and the author of mpmath.
OK, let's go back to the benchmarks. First thing we realized is that Pearu was using CLISP 2.41 (2006-10-13) and compiled Maxima by hand in the above timings, but when I tried Maxima in Debian (which is compiled with GNU Common Lisp (GCL) GCL 2.6.8), I got different results, Maxima did beat sympycore.
SymPyCore:
In [5]: %time e=((x+y+z)**100).expand()
CPU times: user 0.57 s, sys: 0.00 s, total: 0.57 s
Wall time: 0.57
In [6]: %time e=((x+y+z)**20 * (y+x)**19).expand()
CPU times: user 0.25 s, sys: 0.00 s, total: 0.25 s
Wall time: 0.25
Maxima:
(%i7) t0:elapsed_real_time ()$ expand ((x+y+z)^100)$ elapsed_real_time ()-t0;
(%o9) 0.41
(%i16) t0:elapsed_real_time ()$ expand ((x + y+z)^20*(x+z)^19)$ elapsed_real_time ()-t0;
(%o18) 0.080000000000005
So when expanding, Maxima is comparable to sympycore (0.41 vs 0.57), but for general arithmetics, Maxima is 3.5x faster. We also compared GiNaC (resp. swiginac):
>>> %time e=((x+y+z)**20 * (y+x)**19).expand()
CPU times: user 0.03 s, sys: 0.00 s, total: 0.03 s
Wall time: 0.03
Then we compared just the (x+y+z)**200:
sympycore:
>>> %time e=((x+y+z)**200).expand()
CPU times: user 1.80 s, sys: 0.06 s, total: 1.86 s
Wall time: 1.92
swiginac:
>>> %time e=((x+y+z)**200).expand()
CPU times: user 0.52 s, sys: 0.02 s, total: 0.53 s
maxima:
(%i41) t0:elapsed_real_time ()$ expand ((x + y+z)^200)$ elapsed_real_time ()-t0;
(%o43) 2.220000000000027
Where GiNaC still wins, but sympycore beats Maxima, but the timings really depend on the algorithm used, sympycore uses Millers algorithm which is the most efficient.
So then we tried a fair comparison: compare expanding x * y where x and y are expanded powers (to make more terms):
sympycore: >>> from sympy import * >>> x,y,z=map(Symbol,'xyz') >>> xx=((x+y+z)**20).expand() >>> yy=((x+y+z)**21).expand() >>> %time e=(xx*yy).expand() CPU times: user 2.21 s, sys: 0.10 s, total: 2.32 s Wall time: 2.31 swiginac: >>> xx=((x+y+z)**20).expand() >>> yy=((x+y+z)**21).expand() >>> %time e=(xx*yy).expand() CPU times: user 0.30 s, sys: 0.00 s, total: 0.30 s Wall time: 0.30 maxima: (%i44) xx:expand((x+y+z)^20)$ (%i45) yy:expand((x+y+z)^21)$ (%i46) t0:elapsed_real_time ()$ expand (xx*yy)$ elapsed_real_time ()-t0; (%o48) 0.57999999999993
So, sympycore is 7x slower than swiginac and 3x slower than maxima. We are still using pure Python, so that's very promising.
When using sexpr functions directly then 3*(a*x+..) is 4-5x faster than Maxima in Debian/Ubuntu. So, the headline of this post is justified. :)
Conclusion
Let's build the car. Sage has the most features and it is the most complete car. It has issues, some wheels need to be improved (Sage.calculus). Let's change them then. Maybe SymPy could be the new wheel, maybe not, we'll see. SymPy is quite a reasonable car for calculus (it has plotting, it has exports to latex, nice, simple but powerfull command line with ipython and all those bells and whistles and it can also be used as a regular python library). But it also has issues, one wheel should be improved. That's the sympycore project.
All those smaller and smaller wheels show, that this is indeed the way to go, but very important thing is to put them back in the car. I.e. sympycore back to sympy and sympy back to Sage and integrate them well. While also leaving them as separate modules, so that users, that only need one particular wheel, can use them.
2 comments:
Thanks for working on sympy! I think it is going to be really nice, but there still is a bit left until it is as easy to use as maxima (ok maxima is not so easy) or numpy.
It is a lot about coverage ofcourse, since there are many things you can't do right now. Then sympy will have to have some way to make the users understand which way to do things.. arranging the api naturally, having the right possibilities for all objects etc.
Still I think it is really nice that I could write an algebraic iterator for systems of dynamic equations as a python generator.
Yes, it's exactly as you said. A lot of work in front of us. :)
Ondrej
Post a Comment