Thursday, August 14, 2008

SymPy core in Cython, general CAS design thoughts

We managed to write a new core of SymPy that is an order of magnitude faster than the current sympy, but contrary to sympycore (which is Pearu's attempt to write a fast core), it uses the same architecture as sympy, so we can now merge it back and speedup everything.

It started like this: I was unsatisfied with the current sympy speed, and also with the fact, that it was only Kirill and sometimes Mateusz who were submitting speedup patches to sympy, but unfortunately noone was doing any effort to get sympycore merged. I (and I think many people too) hoped that when sympycore got started, the aim was to get it merged back. That was not happening. So I said to myself, ok, noone does the job for you, you need to get your hands dirty and gets things moving yourself.

I came with my thoughts 3 years back, when it was a rainy weekend in Portland, Oregon, I was sitting in my office at the university, just finished playing with swiginac that we wrote with Ola Skavhaug, and said to myself then -- no, there must be a better way. First all this bloated swig thing, but this could be fixed by hand (today we just use Cython). But especially with how things were done in GiNaC internally. Too many classes, too difficult to write new functions. So I wrote my own cas in pure Python just to see if things can be done better. Then a year later at the Space Telescope Science Institute, I was again wondering, ok, a good CAS in Python is really needed.

So I wrote to ginac developers saying my concerns and exchanging ideas how to do it better. I read this email from July 2006 now again and it seems I was right. Citing
"I think there are people, who would implement for example the factorization (you can copy it from eigenmath for example) and other stuff like limits and integration, if the code weren't so
complex (or am I wrong?)."
I was right, except limits (that I wrote myself), all of the above was implemented by someone else in SymPy (Mateusz, but others too, did a stellar job here). It's actually pretty amusing for me to read my other emails from the ginac list, for example this. I basically said the design of SymPy in that email. As one of replies, I was told to consider education in C++. As I also found later, ginac developers get very easily attacking me over nothing. But if one ignores it, I had several interesting discussions with them about a CAS design (we also discussed something later, but I cannot find it now). But anyway, flaming people is something I would never tolerate on a sympy list, because this is exactly what drives people away. And I am not interested in any flame. I just want to get things done. Anyway, I did exactly what I said in those emails on the ginac list with sympy and it seems it worked really well. One remainging piece of the puzzle is speed though. As I watched one Linus talk today, he said "well, maybe 30s is not much for you, but if people are used to do the same thing in 0.1s, trust me, it is way too slow." He's of course 100% right. So we need to speed sympy up drastically if we want to be competitive.

So those were my thoughts recently. So I wrote a SymPy core in pure C over the last weekend, see my emails to the sympy list. I achieved some pretty competitive benchmarks, but there was one "minor" issue -- memory management. I wasn't freeing any memory as I thought I could fix that later easily. And it turned out to be a night mare. I hoped I could use boehm-gc, but it doesn't work with glib (it gives nasty segfaults), that I was using for dictionaries and string management. So I hacked up reference counting just like in Python, but I didn't manage to make it work without segfaults. Segfaults can be debugged, but it's a lot of effort, especially if it fails deeply in some libc function, far away from the code that actually breaks it. Valgrind helps here too, but I want to spend my time on other things than debugging segfaults. Then Robert Bradshaw told me: why not to use Cython? BTW, Kirill was telling me this from the beginning. But I am stubborn, if I believe in something, I want to try it. So I tried and failed. Anyway, I must say I really enjoyed coding in C, it's a very good language.

So then I threw the code away and started from scratch. I think this was the 4th time I wrote the "x+y+x -> 2*x + y" simplification algorithm. For the first time 2 years ago it took me maybe a week to get it right. In the C (3rd time) it took me just maybe an hour. And now (4th time) in Python it was just couple minutes. So it makes me feel good if I can see that I am improving after all. Anyway, so I wrote a core in pure Python, but using all my 2 years experience and a know-how after discussing with Kirill, Mateusz, Fredrik, Pearu and many others how to do things. So our new core, we call it sympyx, is less than 600 lines long, it uses the same architecture as sympy and it is amazingly fast.

Then we reserved couple evenings with Kirill and cythonized it (neither of us has time during the day, I need to work on my thesis and Kirr works). We logged into one machine, setup screen so that we could see each other typing and also type to each other's terminal and started cythonizing. Kirr is very good at it, he submitted couple patches to Cython and he's just much better developer than I am. So he used his approach of rewriting everything directly to Cython, while I used my iterative approach of always trying to satisfy tests. So when he completely broke the core in the middle and I saw on his terminal something like:

In [1]: x+x
[huge stacktrace with Cython errors]

and then I saw the cursor stopped moving, I write to his terminal "haha" and was laughing. Probably a minute later I screwed something up in my terminal and I saw that my cursor wrote haha.
Actually very good moment was that I managed to get all tests run first. So I got hold of his terminal and wrote in his vim session "look at my terminal". And there he saw:

$ py.test
============================= test process starts ==============================
executable: /usr/bin/python (2.5.2-final-0)
using py lib: /usr/lib/python2.5/site-packages/py[13] .............[4] ....

================== tests finished: 17 passed in 0.04 seconds ===================

You know, all tests run, with the core in Cython! And then I saw how my cursor started to move and it was typing "git diff" and "git cherry -v origin" and "git log" and other things, as Kirr was checking that what he sees is correct. But I haven't cdefed the classes, just make it work (there were problems with the __new__ method not yet supported in Cython and other minor things). We both came home from our work and continued, so I continued bringing more and more classes and methods to Cython, but then I got this message from Kirr on my jabber: "all tests pass. Speedup is 3x". So I said to myself -- yes! So he beated me and he did it. I pulled his changes and indeed, everything was cythonized and it was superfast. So his approach provided faster results after all.

We polished things, did couple benchmarks and announced it on sympy and sage-devel lists.

GiNaC is still faster, but not 100x faster, depending on the benchmark usually 2x to 10x faster (but sometimes also slower). There is still a lot of room for both design and technical optimizations, so I think we'll improve for sure. I very strongly believe it is possible to be simple in design, maintainable and fast.

1 comment:

Unknown said...

These are some great news!

Glad to hear that that the project is getting even better!