Tuesday, October 28, 2008

Google Mentor Summit III

On Sunday I checked out from the hotel and enjoyed the breakfast, basically the same as the first picture here. Which is cool, the best way to start a day is with a good breakfast in the morning.



At Google I finally met with Bart Massey, who works at the Portland State University where I was an intern (in 2005) and also he was sponsoring one SymPy GSoC student in 2007.


Then we started to port SymPy to Jython with Philip Jenvey. I first installed openjdk-6, which is in Debian main, then checked out Jython svn, typed "ant" and it just worked. This is so awesome, that we finally have a truly opensource and fully working java implementation in Debian. Philip fixed some bugs already, so SymPy imports just fine. Most of the tests run, but also quite a lot still fail.

We managed to isolate two bugs: issue 1158 and 1159. Then there was an annoying recursion bug, that took us several hours to dig into and I then had to leave at 6pm without a clue. But Philip kept saying, I just need to look more at the java stacktraces that Jython was generating and I'll figure it out. And he did! The next morning he filed an issue 412 in the PyPy tracker, because they seem to have the same problem. There are subtle differences in how to handle __mul__ and __rmul__ methods that are not well documented, so it works in CPython but fails in Jython/PyPy. I think we can fix that in SymPy too in the meantime, so I am very excited, because it seems SymPy will work on top of Jython soon.

There was also a git vs hg session with both hg and git developers. Since I recently switched from hg to git for all stuff where I can choose the vcs, it was very interesting for me. And I must say I am glad I switched. Mercurial works fine enough and with the recent rebase feature I think it is usable, but git just has a bigger momentum at the moment and it just doesn't get in my way.

During the day Steve sent an email to the GSoC list that Dirk forgot his jacket in the hotel room, so I said to myself, haha, that's a pity. And then I realized, oh shit, I left my jacket there as well. They didn't have it in the evening when I stopped there, but fortunately they sent me an email yesterday that they found it, so I'll stop there on my way back.

We also took couple of group pictures, here is just my lame attempt (I am sure people will soon post better ones):


On Monday morning I met with Fernando Perez, the author of ipython, he know works with Jarrod. Then I rented a car and came to visit Brian Granger and his family in San Luis Obispo.

We went to the beach in the evening:





And had a dinner together. Brian showed me his options pricing example he did with SymPy, so I made him create an issue 1175 with the code.
Then we played with the parallel stuff that Brian contributed to ipython and we tried sympy with it and to our disappointment, we found pickling bugs in sympy, it works with protocol 0, but not protocol 2, see the issue 1176 and 1177. This almost smells like a bug in Python itself, but I need to investigate that later more deeply.

Today is Tuesday and we'll work on doing atomic physics calculations with sympy. See also Brian's email to the sympy list about it.

Sunday, October 26, 2008

Google Mentor Summit II

On Saturday morning I met Steve McIntyre and Dirk Eddelbuettel, Steve took us by car to Google. Well, it was overwhelming.

For example we met there two other Debian developers, Jeff Bailey and Daniel Burrows. There are developers from all the famous projects, like git, Wine, Turbogears, Jython, ... There are still people I wanted to meet but didn't manage yesterday, need to fix that today. My plan is to get SymPy running on top of Jython, or at least do some more progress to take the advantage, that Philip Jenvey from Jython is here. Also I wanted to fix some RC bugs in Debian when Steve is here, to make some progress on my NM finally. We'll see how it goes.

There was also an excellent presentation from people behind Android. One thing I was curious if it's going to run native C applications, like Python and it seems it will happen eventually, all the sources are out there, so someone just needs to push it forward. Another very cool thing they did that I was always looking for is Gerrit for reviewing patches (see review.source.android.com for example how it looks like) -- basically a fork of Guido's codereview, but it has many nice features, like automatically applying the patch to the git repository as a new topic branch and one can then easily pull it, as well as review it over the web interface. When I get back to Prague I am going to install this for SymPy.

Saturday, October 25, 2008

Google Mentor Summit I

I flew via Atlanta and had only about an hour to my next flight to San Francisco, so after my last experience, when I went to the immigration, got stuck in the line for more than an hour and then had to run to catch my flight, I decided to try a different strategy this time: first run to the immigration and then walk to my gate. Unfortunately I was sitting near the back of the plane and I got out among the last ones. Fortunately, it was several hundreds meters to the immigration, so I run as fast as I could and I managed to get there as the first one and everything took about 5 minutes. That was just awesome, I finally figured this out.

Jarrod was waiting for me at the airport, went to his place. Here's his cat:



On Friday we did some work and then went to the Golden Gate, the traffic was quite dense:




Alcatraz:


San Francisco:




Then we had a cofee in San Francisco and went to Silicon Valey, Jarrod drove me around a little bit to see SLAC, Stanford campus and other things. In the evening we went to the common pub with other mentors and Google guys, where we for example met with Robert Bradshaw.

Today I am looking forward to meet with all the people I know from mailinglists, Debian and other places.

Wednesday, October 1, 2008

Gael and Emmanuelle in Prague

Last weekend Gaël Varoquaux together with Emmanuelle came to Prague, so it was really awesome to meet again (we first met with Gaël at the SciPy 2007 conference at Caltech).

Because I recently finished my master, I was basically visiting pubs each evening, so we first met on Friday at 10pm and just had couple Pilsens (I already had some beers that evening with other friends), then I had to go, as I was doing TOEFL on Saturday. At the exam I met a very beautiful girl and other interesting people, so we went to lunch together and as a result I arrived half an our later than we agreed with Gaël and Emmanuelle, but I think they understood my situation. :) Then we went around Vyšehrad, came to I.P.Pavlova, had some Czech meal and beers, then went on foot pass the Church of Saints Cyril and Methodius, where the Heydrich attackers were cornered (there is still the bullet-scarred window visible from the street), then continued over the bridge to the hotel, Emmanuelle went in, I then came with Gaël to another pub for couple of more beers.

The Python scientific community is very cool and I always enjoy meeting people from it and discussing things like cython, scipy, ipython, mayavi, sympy, matplotlib, sage, what license is best for each project, etc. in Prague pubs. Python has a lot of high quality libraries and tools for scientific computing, so things look very promising.

It was fun I really enjoyed that.

Monday, September 22, 2008

master studies

I did it! :) I defended my thesis and passed master finals from theoretical physics at Charles University in Prague couple hours ago.

After almost dropping out of my school exactly a year ago for not having enough credits to go to the next year, I gave myself an obligation to finish my school on time. I worked very hard the last year, I had to do 8 exams, some of them very hard, requiring more than a week of thorough learning and 9 seminars, requiring a lot of work too and also a master thesis, for which I had to had working finite element solvers and together with SymPy it took all my time and energy. I even had to cancel my trip to Austin and Caltech for the SciPy conference. But I finished my school after all, I am very happy about it.

Last year, two of my friends bet $100 between themselves that I will not finish on time. Jarda, who believed in me is now at Princeton doing his Ph.D. Matouš, who didn't believe in me, will now pay $100 to Jarda. I think that life is fair.:)

Now I'll be visiting pubs quite often and then I'll fix some long standing issues in SymPy, hopefully finish my Debian task & skills (to finally become a Developer couple months after that) and finally do useful stuff for my research with a fresh head now.

Saturday, August 16, 2008

I am switching from Mercurial to git

After a long night of debugging and while preparing, reviewing and pushing final patches before a SymPy release, I got this:

$ hg qpush
applying 1645.diff
Unable to read 1645.diff
** unknown exception encountered, details follow
** report bug details to http://www.selenic.com/mercurial/bts
** or mercurial@selenic.com
** Mercurial Distributed SCM (version 1.0.1)
Traceback (most recent call last):
File "/usr/bin/hg", line 20, in
mercurial.dispatch.run()
File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 20, in run
sys.exit(dispatch(sys.argv[1:]))
File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 29, in dispatch
return _runcatch(u, args)
File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 45, in _runcatch
return _dispatch(ui, args)
File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 364, in _dispatch
ret = _runcommand(ui, options, cmd, d)
File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 417, in _runcommand
return checkargs()
File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 373, in checkargs
return cmdfunc()
File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 356, in
d = lambda: func(ui, repo, *args, **cmdoptions)
File "/var/lib/python-support/python2.5/hgext/mq.py", line 1942, in push
mergeq=mergeq)
File "/var/lib/python-support/python2.5/hgext/mq.py", line 833, in push
top = self.applied[-1].name
IndexError: list index out of range

I am using:

$ hg --version
Mercurial Distributed SCM (version 1.0.1)

Copyright (C) 2005-2008 Matt Mackall and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


And I just got pissed off and I am switching to git for good. I've been using mercurial every day since about November 2007, and I think I am quite proficient in it. But we were missing some features, like recording a patch by hunks (git add -p), so Kirill Smelkov (another SymPy developer) just implemented this to mercurial queues, it is now included in hg 1.0. But I am constantly beaten by stupid bugs, mercurial broke into pieces several times already (last time with hg bisect, before it with mercurial queues and file renaming), that just should not happen in a production version. Another problem is that hgweb is constantly using 100% of CPU on my server whenever someone clicks to see the contents of the README file. Again I can spend time debugging it, but I will just use git and if it works, I'll stay with it.

I started learning git recently and it is just better in every aspect. Bigger community, it has all the features that I was always missing in mercurial (like rebase, hunks recording, diff diff --color-words, better gitk). And it's superfast.

Unless I find some showstopper bugs in git too, I don't think I am coming back.

We'll have to create a live mercurial mirror of SymPy, so that people can continue using mercurial, but I'll myself just be using git. We wrote a simple translation table for anyone using mercurial to get started:

http://wiki.sympy.org/wiki/Git_hg_rosetta_stone

I love Python and I like that Mercurial is just a small C core and the rest is in Python, but for some reason, it still has the baby bugs, that should have been fixed years ago and why should we spend our time fixing and improving Mercurial, if other people have already done the job for us (and usually a better job than I could do) in git?

Another reason for switching is that in Debian, mercurial is not used for packages, while git is used a lot.

Thursday, August 14, 2008

SymPy core in Cython, general CAS design thoughts

We managed to write a new core of SymPy that is an order of magnitude faster than the current sympy, but contrary to sympycore (which is Pearu's attempt to write a fast core), it uses the same architecture as sympy, so we can now merge it back and speedup everything.

It started like this: I was unsatisfied with the current sympy speed, and also with the fact, that it was only Kirill and sometimes Mateusz who were submitting speedup patches to sympy, but unfortunately noone was doing any effort to get sympycore merged. I (and I think many people too) hoped that when sympycore got started, the aim was to get it merged back. That was not happening. So I said to myself, ok, noone does the job for you, you need to get your hands dirty and gets things moving yourself.

I came with my thoughts 3 years back, when it was a rainy weekend in Portland, Oregon, I was sitting in my office at the university, just finished playing with swiginac that we wrote with Ola Skavhaug, and said to myself then -- no, there must be a better way. First all this bloated swig thing, but this could be fixed by hand (today we just use Cython). But especially with how things were done in GiNaC internally. Too many classes, too difficult to write new functions. So I wrote my own cas in pure Python just to see if things can be done better. Then a year later at the Space Telescope Science Institute, I was again wondering, ok, a good CAS in Python is really needed.

So I wrote to ginac developers saying my concerns and exchanging ideas how to do it better. I read this email from July 2006 now again and it seems I was right. Citing
"I think there are people, who would implement for example the factorization (you can copy it from eigenmath for example) and other stuff like limits and integration, if the code weren't so
complex (or am I wrong?)."
I was right, except limits (that I wrote myself), all of the above was implemented by someone else in SymPy (Mateusz, but others too, did a stellar job here). It's actually pretty amusing for me to read my other emails from the ginac list, for example this. I basically said the design of SymPy in that email. As one of replies, I was told to consider education in C++. As I also found later, ginac developers get very easily attacking me over nothing. But if one ignores it, I had several interesting discussions with them about a CAS design (we also discussed something later, but I cannot find it now). But anyway, flaming people is something I would never tolerate on a sympy list, because this is exactly what drives people away. And I am not interested in any flame. I just want to get things done. Anyway, I did exactly what I said in those emails on the ginac list with sympy and it seems it worked really well. One remainging piece of the puzzle is speed though. As I watched one Linus talk today, he said "well, maybe 30s is not much for you, but if people are used to do the same thing in 0.1s, trust me, it is way too slow." He's of course 100% right. So we need to speed sympy up drastically if we want to be competitive.

So those were my thoughts recently. So I wrote a SymPy core in pure C over the last weekend, see my emails to the sympy list. I achieved some pretty competitive benchmarks, but there was one "minor" issue -- memory management. I wasn't freeing any memory as I thought I could fix that later easily. And it turned out to be a night mare. I hoped I could use boehm-gc, but it doesn't work with glib (it gives nasty segfaults), that I was using for dictionaries and string management. So I hacked up reference counting just like in Python, but I didn't manage to make it work without segfaults. Segfaults can be debugged, but it's a lot of effort, especially if it fails deeply in some libc function, far away from the code that actually breaks it. Valgrind helps here too, but I want to spend my time on other things than debugging segfaults. Then Robert Bradshaw told me: why not to use Cython? BTW, Kirill was telling me this from the beginning. But I am stubborn, if I believe in something, I want to try it. So I tried and failed. Anyway, I must say I really enjoyed coding in C, it's a very good language.

So then I threw the code away and started from scratch. I think this was the 4th time I wrote the "x+y+x -> 2*x + y" simplification algorithm. For the first time 2 years ago it took me maybe a week to get it right. In the C (3rd time) it took me just maybe an hour. And now (4th time) in Python it was just couple minutes. So it makes me feel good if I can see that I am improving after all. Anyway, so I wrote a core in pure Python, but using all my 2 years experience and a know-how after discussing with Kirill, Mateusz, Fredrik, Pearu and many others how to do things. So our new core, we call it sympyx, is less than 600 lines long, it uses the same architecture as sympy and it is amazingly fast.

Then we reserved couple evenings with Kirill and cythonized it (neither of us has time during the day, I need to work on my thesis and Kirr works). We logged into one machine, setup screen so that we could see each other typing and also type to each other's terminal and started cythonizing. Kirr is very good at it, he submitted couple patches to Cython and he's just much better developer than I am. So he used his approach of rewriting everything directly to Cython, while I used my iterative approach of always trying to satisfy tests. So when he completely broke the core in the middle and I saw on his terminal something like:

In [1]: x+x
[huge stacktrace with Cython errors]

and then I saw the cursor stopped moving, I write to his terminal "haha" and was laughing. Probably a minute later I screwed something up in my terminal and I saw that my cursor wrote haha.
Actually very good moment was that I managed to get all tests run first. So I got hold of his terminal and wrote in his vim session "look at my terminal". And there he saw:

$ py.test
============================= test process starts ==============================
executable: /usr/bin/python (2.5.2-final-0)
using py lib: /usr/lib/python2.5/site-packages/py

test_basic.py[13] .............
test_basic2.py[4] ....

================== tests finished: 17 passed in 0.04 seconds ===================

You know, all tests run, with the core in Cython! And then I saw how my cursor started to move and it was typing "git diff" and "git cherry -v origin" and "git log" and other things, as Kirr was checking that what he sees is correct. But I haven't cdefed the classes, just make it work (there were problems with the __new__ method not yet supported in Cython and other minor things). We both came home from our work and continued, so I continued bringing more and more classes and methods to Cython, but then I got this message from Kirr on my jabber: "all tests pass. Speedup is 3x". So I said to myself -- yes! So he beated me and he did it. I pulled his changes and indeed, everything was cythonized and it was superfast. So his approach provided faster results after all.

We polished things, did couple benchmarks and announced it on sympy and sage-devel lists.

GiNaC is still faster, but not 100x faster, depending on the benchmark usually 2x to 10x faster (but sometimes also slower). There is still a lot of room for both design and technical optimizations, so I think we'll improve for sure. I very strongly believe it is possible to be simple in design, maintainable and fast.