Ondřej Čertík: 2008

Tuesday, December 23, 2008

Experience with git after 4 months

I switched to git from mercurial about 4 months ago. As Matt Mackall (the main author of mercurial) has pointed out, my arguments were partially just excuses for the switch, because everything could be fixed in mercurial as well (eventually). On the other hand, I really wasn't expecting to switch when I invested time and energy to learn mercurial. Also because I thought that git doesn't have such a nice documentation, has a steep learning curve and the community is hostile, because it is developed by the kernel people who are famous for sometimes being not so nice on the mailinglists (as I now found out, none of these are true). I was getting annoyed by small things one by one, after finally I said to myself enough is enough, let's just switch.

So as to the steep learning curve --- it took me a day or two to being able to do what I want, so it seems to me it's really easy to learn once you know some distributed vcs. I found out the documentation is much better than in mercurial --- in git one just types "git help command" and a full man page fires up with lot's of information and examples. In mercurial, one just gets a couple lines (compared to git) of help in the terminal, without examples. Also the first (actually second) thing I noticed is that "git log" (and other commands like "git diff") doesn't scroll out my terminal, but uses "less" automatically -- that's just great. The first thing I noticed is that git is superfast for almost every operation, especially it's noticeable with "git diff". As to the community, I only had a chance to send a few emails to the git list, so I cannot really say, but so far it's very responsive and friendly.

However, those are just little things that can (and I am sure they will) be fixed in mercurial as well. What is a big thing are branches, especially remote branches. One just fetches a remote repository and then works with it like with any other local branch, e.g. one can switch to it, or just "git log some_remote_branch" to see what is in there. One can easily compare them etc. With mercurial, I was using "hg out" and "hg in" commands to see what changes I will pull or push, but those commands require internet connection, so it really sucks and it's slow. In mercurial, I was using different directories for different branches, but that's just extremely inconvenient and also slow (creating a new branch is just a matter of adding one file, not copying the whole repository --- for example with the sympy hg repository, I often had to wait several seconds to clone a repo, while with the git repository it's just instant) and big (in terms of megabytes).

The bible of Mercurial, the hgbook says:

In most instances, isolating branches in repositories is the right approach.

But I think it's the wrong approach. The only argument for doing this that I accept is that sometimes you are not sure which branch you are in in git --- well, I use this trick (actually, just read the documentation of __git_ps1 in /etc/bash_completion.d/git), so my bash line always says which branch I am in using the red color. I never do any mistakes with this.

Anyways, I could continue like this (e.g. check out "git svn", "git rebase -i" and many other things), but you can read why git is better for example here, no need to repeat here. I am very happy now and I can only recommend to learn git.

So let me also say some things that are worse in git --- one thing that could be improved are URLs of the gitweb (hgweb urls are neat, gitweb urls use "?id=..." stuff). Another thing is that the debian package git is not git! You need to install git-core to get git (but this seems to be getting fixed).

There is also an interesting discussion happening right now in the debian-python mailinglist about switching the DPMT repositories to git. I think almost all opinions (for and against) were already stated, but if you have some opinion that wasn't yet said, please do so.

Tuesday, October 28, 2008

Google Mentor Summit III

On Sunday I checked out from the hotel and enjoyed the breakfast, basically the same as the first picture here. Which is cool, the best way to start a day is with a good breakfast in the morning.

At Google I finally met with Bart Massey, who works at the Portland State University where I was an intern (in 2005) and also he was sponsoring one SymPy GSoC student in 2007.

Then we started to port SymPy to Jython with Philip Jenvey. I first installed openjdk-6, which is in Debian main, then checked out Jython svn, typed "ant" and it just worked. This is so awesome, that we finally have a truly opensource and fully working java implementation in Debian. Philip fixed some bugs already, so SymPy imports just fine. Most of the tests run, but also quite a lot still fail.

We managed to isolate two bugs: issue 1158 and 1159. Then there was an annoying recursion bug, that took us several hours to dig into and I then had to leave at 6pm without a clue. But Philip kept saying, I just need to look more at the java stacktraces that Jython was generating and I'll figure it out. And he did! The next morning he filed an issue 412 in the PyPy tracker, because they seem to have the same problem. There are subtle differences in how to handle __mul__ and __rmul__ methods that are not well documented, so it works in CPython but fails in Jython/PyPy. I think we can fix that in SymPy too in the meantime, so I am very excited, because it seems SymPy will work on top of Jython soon.

There was also a git vs hg session with both hg and git developers. Since I recently switched from hg to git for all stuff where I can choose the vcs, it was very interesting for me. And I must say I am glad I switched. Mercurial works fine enough and with the recent rebase feature I think it is usable, but git just has a bigger momentum at the moment and it just doesn't get in my way.

During the day Steve sent an email to the GSoC list that Dirk forgot his jacket in the hotel room, so I said to myself, haha, that's a pity. And then I realized, oh shit, I left my jacket there as well. They didn't have it in the evening when I stopped there, but fortunately they sent me an email yesterday that they found it, so I'll stop there on my way back.

We also took couple of group pictures, here is just my lame attempt (I am sure people will soon post better ones):

On Monday morning I met with Fernando Perez, the author of ipython, he know works with Jarrod. Then I rented a car and came to visit Brian Granger and his family in San Luis Obispo.

We went to the beach in the evening:

And had a dinner together. Brian showed me his options pricing example he did with SymPy, so I made him create an issue 1175 with the code.
Then we played with the parallel stuff that Brian contributed to ipython and we tried sympy with it and to our disappointment, we found pickling bugs in sympy, it works with protocol 0, but not protocol 2, see the issue 1176 and 1177. This almost smells like a bug in Python itself, but I need to investigate that later more deeply.

Today is Tuesday and we'll work on doing atomic physics calculations with sympy. See also Brian's email to the sympy list about it.

Sunday, October 26, 2008

Google Mentor Summit II

On Saturday morning I met Steve McIntyre and Dirk Eddelbuettel, Steve took us by car to Google. Well, it was overwhelming.

For example we met there two other Debian developers, Jeff Bailey and Daniel Burrows. There are developers from all the famous projects, like git, Wine, Turbogears, Jython, ... There are still people I wanted to meet but didn't manage yesterday, need to fix that today. My plan is to get SymPy running on top of Jython, or at least do some more progress to take the advantage, that Philip Jenvey from Jython is here. Also I wanted to fix some RC bugs in Debian when Steve is here, to make some progress on my NM finally. We'll see how it goes.

There was also an excellent presentation from people behind Android. One thing I was curious if it's going to run native C applications, like Python and it seems it will happen eventually, all the sources are out there, so someone just needs to push it forward. Another very cool thing they did that I was always looking for is Gerrit for reviewing patches (see review.source.android.com for example how it looks like) -- basically a fork of Guido's codereview, but it has many nice features, like automatically applying the patch to the git repository as a new topic branch and one can then easily pull it, as well as review it over the web interface. When I get back to Prague I am going to install this for SymPy.

Saturday, October 25, 2008

Google Mentor Summit I

I flew via Atlanta and had only about an hour to my next flight to San Francisco, so after my last experience, when I went to the immigration, got stuck in the line for more than an hour and then had to run to catch my flight, I decided to try a different strategy this time: first run to the immigration and then walk to my gate. Unfortunately I was sitting near the back of the plane and I got out among the last ones. Fortunately, it was several hundreds meters to the immigration, so I run as fast as I could and I managed to get there as the first one and everything took about 5 minutes. That was just awesome, I finally figured this out.

Jarrod was waiting for me at the airport, went to his place. Here's his cat:

On Friday we did some work and then went to the Golden Gate, the traffic was quite dense:

Alcatraz:

San Francisco:

Then we had a cofee in San Francisco and went to Silicon Valey, Jarrod drove me around a little bit to see SLAC, Stanford campus and other things. In the evening we went to the common pub with other mentors and Google guys, where we for example met with Robert Bradshaw.

Today I am looking forward to meet with all the people I know from mailinglists, Debian and other places.

Wednesday, October 1, 2008

Gael and Emmanuelle in Prague

Last weekend Gaël Varoquaux together with Emmanuelle came to Prague, so it was really awesome to meet again (we first met with Gaël at the SciPy 2007 conference at Caltech).

Because I recently finished my master, I was basically visiting pubs each evening, so we first met on Friday at 10pm and just had couple Pilsens (I already had some beers that evening with other friends), then I had to go, as I was doing TOEFL on Saturday. At the exam I met a very beautiful girl and other interesting people, so we went to lunch together and as a result I arrived half an our later than we agreed with Gaël and Emmanuelle, but I think they understood my situation. :) Then we went around Vyšehrad, came to I.P.Pavlova, had some Czech meal and beers, then went on foot pass the Church of Saints Cyril and Methodius, where the Heydrich attackers were cornered (there is still the bullet-scarred window visible from the street), then continued over the bridge to the hotel, Emmanuelle went in, I then came with Gaël to another pub for couple of more beers.

The Python scientific community is very cool and I always enjoy meeting people from it and discussing things like cython, scipy, ipython, mayavi, sympy, matplotlib, sage, what license is best for each project, etc. in Prague pubs. Python has a lot of high quality libraries and tools for scientific computing, so things look very promising.

It was fun I really enjoyed that.

Monday, September 22, 2008

master studies

I did it! :) I defended my thesis and passed master finals from theoretical physics at Charles University in Prague couple hours ago.

After almost dropping out of my school exactly a year ago for not having enough credits to go to the next year, I gave myself an obligation to finish my school on time. I worked very hard the last year, I had to do 8 exams, some of them very hard, requiring more than a week of thorough learning and 9 seminars, requiring a lot of work too and also a master thesis, for which I had to had working finite element solvers and together with SymPy it took all my time and energy. I even had to cancel my trip to Austin and Caltech for the SciPy conference. But I finished my school after all, I am very happy about it.

Last year, two of my friends bet $100 between themselves that I will not finish on time. Jarda, who believed in me is now at Princeton doing his Ph.D. Matouš, who didn't believe in me, will now pay $100 to Jarda. I think that life is fair.:)

Now I'll be visiting pubs quite often and then I'll fix some long standing issues in SymPy, hopefully finish my Debian task & skills (to finally become a Developer couple months after that) and finally do useful stuff for my research with a fresh head now.

Saturday, August 16, 2008

I am switching from Mercurial to git

After a long night of debugging and while preparing, reviewing and pushing final patches before a SymPy release, I got this:


$ hg qpush 
applying 1645.diff
Unable to read 1645.diff
** unknown exception encountered, details follow
** report bug details to http://www.selenic.com/mercurial/bts
** or mercurial@selenic.com
** Mercurial Distributed SCM (version 1.0.1)
Traceback (most recent call last):
  File "/usr/bin/hg", line 20, in 
    mercurial.dispatch.run()
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 20, in run
    sys.exit(dispatch(sys.argv[1:]))
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 29, in dispatch
    return _runcatch(u, args)
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 45, in _runcatch
    return _dispatch(ui, args)
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 364, in _dispatch
    ret = _runcommand(ui, options, cmd, d)
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 417, in _runcommand
    return checkargs()
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 373, in checkargs
    return cmdfunc()
  File "/var/lib/python-support/python2.5/mercurial/dispatch.py", line 356, in 
    d = lambda: func(ui, repo, *args, **cmdoptions)
  File "/var/lib/python-support/python2.5/hgext/mq.py", line 1942, in push
    mergeq=mergeq)
  File "/var/lib/python-support/python2.5/hgext/mq.py", line 833, in push
    top = self.applied[-1].name
IndexError: list index out of range

I am using:


$ hg --version 
Mercurial Distributed SCM (version 1.0.1)

Copyright (C) 2005-2008 Matt Mackall  and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And I just got pissed off and I am switching to git for good. I've been using mercurial every day since about November 2007, and I think I am quite proficient in it. But we were missing some features, like recording a patch by hunks (git add -p), so Kirill Smelkov (another SymPy developer) just implemented this to mercurial queues, it is now included in hg 1.0. But I am constantly beaten by stupid bugs, mercurial broke into pieces several times already (last time with hg bisect, before it with mercurial queues and file renaming), that just should not happen in a production version. Another problem is that hgweb is constantly using 100% of CPU on my server whenever someone clicks to see the contents of the README file. Again I can spend time debugging it, but I will just use git and if it works, I'll stay with it.

I started learning git recently and it is just better in every aspect. Bigger community, it has all the features that I was always missing in mercurial (like rebase, hunks recording, diff diff --color-words, better gitk). And it's superfast.

Unless I find some showstopper bugs in git too, I don't think I am coming back.

We'll have to create a live mercurial mirror of SymPy, so that people can continue using mercurial, but I'll myself just be using git. We wrote a simple translation table for anyone using mercurial to get started:

http://wiki.sympy.org/wiki/Git_hg_rosetta_stone

I love Python and I like that Mercurial is just a small C core and the rest is in Python, but for some reason, it still has the baby bugs, that should have been fixed years ago and why should we spend our time fixing and improving Mercurial, if other people have already done the job for us (and usually a better job than I could do) in git?

Another reason for switching is that in Debian, mercurial is not used for packages, while git is used a lot.

Thursday, August 14, 2008

SymPy core in Cython, general CAS design thoughts

We managed to write a new core of SymPy that is an order of magnitude faster than the current sympy, but contrary to sympycore (which is Pearu's attempt to write a fast core), it uses the same architecture as sympy, so we can now merge it back and speedup everything.

It started like this: I was unsatisfied with the current sympy speed, and also with the fact, that it was only Kirill and sometimes Mateusz who were submitting speedup patches to sympy, but unfortunately noone was doing any effort to get sympycore merged. I (and I think many people too) hoped that when sympycore got started, the aim was to get it merged back. That was not happening. So I said to myself, ok, noone does the job for you, you need to get your hands dirty and gets things moving yourself.

I came with my thoughts 3 years back, when it was a rainy weekend in Portland, Oregon, I was sitting in my office at the university, just finished playing with swiginac that we wrote with Ola Skavhaug, and said to myself then -- no, there must be a better way. First all this bloated swig thing, but this could be fixed by hand (today we just use Cython). But especially with how things were done in GiNaC internally. Too many classes, too difficult to write new functions. So I wrote my own cas in pure Python just to see if things can be done better. Then a year later at the Space Telescope Science Institute, I was again wondering, ok, a good CAS in Python is really needed.

So I wrote to ginac developers saying my concerns and exchanging ideas how to do it better. I read this email from July 2006 now again and it seems I was right. Citing

"I think there are people, who would implement for example the factorization (you can copy it from eigenmath for example) and other stuff like limits and integration, if the code weren't so
complex (or am I wrong?)."

I was right, except limits (that I wrote myself), all of the above was implemented by someone else in SymPy (Mateusz, but others too, did a stellar job here). It's actually pretty amusing for me to read my other emails from the ginac list, for example this. I basically said the design of SymPy in that email. As one of replies, I was told to consider education in C++. As I also found later, ginac developers get very easily attacking me over nothing. But if one ignores it, I had several interesting discussions with them about a CAS design (we also discussed something later, but I cannot find it now). But anyway, flaming people is something I would never tolerate on a sympy list, because this is exactly what drives people away. And I am not interested in any flame. I just want to get things done. Anyway, I did exactly what I said in those emails on the ginac list with sympy and it seems it worked really well. One remainging piece of the puzzle is speed though. As I watched one Linus talk today, he said "well, maybe 30s is not much for you, but if people are used to do the same thing in 0.1s, trust me, it is way too slow." He's of course 100% right. So we need to speed sympy up drastically if we want to be competitive.

So those were my thoughts recently. So I wrote a SymPy core in pure C over the last weekend, see my emails to the sympy list. I achieved some pretty competitive benchmarks, but there was one "minor" issue -- memory management. I wasn't freeing any memory as I thought I could fix that later easily. And it turned out to be a night mare. I hoped I could use boehm-gc, but it doesn't work with glib (it gives nasty segfaults), that I was using for dictionaries and string management. So I hacked up reference counting just like in Python, but I didn't manage to make it work without segfaults. Segfaults can be debugged, but it's a lot of effort, especially if it fails deeply in some libc function, far away from the code that actually breaks it. Valgrind helps here too, but I want to spend my time on other things than debugging segfaults. Then Robert Bradshaw told me: why not to use Cython? BTW, Kirill was telling me this from the beginning. But I am stubborn, if I believe in something, I want to try it. So I tried and failed. Anyway, I must say I really enjoyed coding in C, it's a very good language.

So then I threw the code away and started from scratch. I think this was the 4th time I wrote the "x+y+x -> 2*x + y" simplification algorithm. For the first time 2 years ago it took me maybe a week to get it right. In the C (3rd time) it took me just maybe an hour. And now (4th time) in Python it was just couple minutes. So it makes me feel good if I can see that I am improving after all. Anyway, so I wrote a core in pure Python, but using all my 2 years experience and a know-how after discussing with Kirill, Mateusz, Fredrik, Pearu and many others how to do things. So our new core, we call it sympyx, is less than 600 lines long, it uses the same architecture as sympy and it is amazingly fast.

Then we reserved couple evenings with Kirill and cythonized it (neither of us has time during the day, I need to work on my thesis and Kirr works). We logged into one machine, setup screen so that we could see each other typing and also type to each other's terminal and started cythonizing. Kirr is very good at it, he submitted couple patches to Cython and he's just much better developer than I am. So he used his approach of rewriting everything directly to Cython, while I used my iterative approach of always trying to satisfy tests. So when he completely broke the core in the middle and I saw on his terminal something like:


In [1]: x+x
[huge stacktrace with Cython errors]

and then I saw the cursor stopped moving, I write to his terminal "haha" and was laughing. Probably a minute later I screwed something up in my terminal and I saw that my cursor wrote haha.
Actually very good moment was that I managed to get all tests run first. So I got hold of his terminal and wrote in his vim session "look at my terminal". And there he saw:


$ py.test
============================= test process starts ==============================
executable:   /usr/bin/python  (2.5.2-final-0)
using py lib: /usr/lib/python2.5/site-packages/py 

test_basic.py[13] .............
test_basic2.py[4] ....

================== tests finished: 17 passed in 0.04 seconds ===================

You know, all tests run, with the core in Cython! And then I saw how my cursor started to move and it was typing "git diff" and "git cherry -v origin" and "git log" and other things, as Kirr was checking that what he sees is correct. But I haven't cdefed the classes, just make it work (there were problems with the __new__ method not yet supported in Cython and other minor things). We both came home from our work and continued, so I continued bringing more and more classes and methods to Cython, but then I got this message from Kirr on my jabber: "all tests pass. Speedup is 3x". So I said to myself -- yes! So he beated me and he did it. I pulled his changes and indeed, everything was cythonized and it was superfast. So his approach provided faster results after all.

We polished things, did couple benchmarks and announced it on sympy and sage-devel lists.

GiNaC is still faster, but not 100x faster, depending on the benchmark usually 2x to 10x faster (but sometimes also slower). There is still a lot of room for both design and technical optimizations, so I think we'll improve for sure. I very strongly believe it is possible to be simple in design, maintainable and fast.

Saturday, May 24, 2008

Ubuntu Developer Summit in Prague

Last weekend I was at FOSSCamp. Since I live in Prague I wanted to go to Ubuntu Developer Summit (UDS) each day, but unfortunately I had some exams, so I only went on Wednesday and Friday.

On Wednesday I first met Lars Wirzenius:

we agreed to go to pub in the evening. Then I did a little work, there was quite a nice view from the window (Prague castle on the horizon):

and I went to the #ubuntu-devel-summit IRC channel and pinged Scott Kitterman, whom I new from the Debian Python Modules Team (DPMT), but didn't know how he looks like. We met and once I knew Scott, it was easy to get around, so he introduced me to Steve Langasek (pronounced Langášek). We agreed to go to pub as well. Steve lives in Portland, OR, where I spent the summer 2005 and Scott is from Baltimore where I spent the summer 2006.
Then I also met Riku Voipio, Martin Böhm, Christian Reis (whom I asked if it's possible to support Debian unstable on Ubuntu Personal Package Archives and he said that it will probably happen, so that's really cool -- I also offered my help with this) and others, so in the end, there were 14 of us going to the pub, so I chose again the same pub as with the FOSSCamp people and it seems it tasted good again:

Notice the svíčková na smetaně above, my favourite meal. Good choice Scott. :)

Ok, that was on Wednesday. On Friday I arrived at around 3pm, looked at the schedule table and noticed that Matthias Klose should be at UDS too, so I started IRC and pinged him. Fortunately, Emilio Pozuelo Monfort, whom I know from DPMT as well, replied first so we met, it was cool and he showed where Matthias is. I am very glad I met him, so we discussed python-central and python-support packages and why we have them both, also with Scott later on.

When I was waiting for Matthias, I sat next to Nicolas Valcárcel, started my laptop and begun looking at some SymPy bugs and Nicolas noticed that and said -- "You are developing SymPy?", I said "Yes.", flattered. And he showed me a bug with plotting and Compiz, so we immediately reported that to pyglet.

In the evening people continued to some kind of a party, but unfortunately, I was already going to some other pub.

Overall, even though I was there for only two afternoons, it was just awesome and I utterly enjoyed meeting all the people I knew from mailinglists and IRC.

Tuesday, May 20, 2008

Installing a printer in Debian

I finally bought a new toner to my old Minolta 1250e that I didn't use for a few years (because I was lazy to fix it) and I was surprised how easy it is to install it these days:

1. plugged the USB cable from the printer to my laptop
2. wajig install cupsys foomatic-db-engine
3. sudo ln -s /usr/bin/foomatic-ppdfile /usr/lib/cups/driver
4. On http://localhost:631/ I added the new printer in cupsys, it showed my printer over USB, I selected it and then I chose the correct PPD.

And everything just works. The only nontrivial part is the step 3, that was suggested in /usr/share/doc/foomatic-db-engine/USAGE.gz:


If the printers of the Foomatic database do not appear, check whether the
link to foomatic-ppdfile is in /usr/lib/cups/driver:

lrwxrwxrwx  1 root root 25 Apr 19 18:13 foomatic -> /usr/bin/foomatic-ppdfile

If not, create it manually.

Without it, I didn't see the right PPD in the step 4. Maybe it'd be a nice idea to do this automatically (either when installing foomatic-db-engine or cupsys), or am I missing something?

As to the PPDs, one can get all printer IDs by:


$ foomatic-ppdfile -A

and the PPD for my printer by


$ foomatic-ppdfile -p Minolta-PagePro_1250E

However, it's not necessary in the above howto.

Friday, May 16, 2008

FOSSCamp, Friday

Since I live in Prague, it's basically compulsory to go to FOSSCamp. Yesterday I went with Lucas to some pubs + sightseeing, today we went in a larger group to this pub:

and we had a couple of good Czech meals with Plzeň beer:

Seems it tasted good:

Wednesday, May 7, 2008

snapshot.debian.net saved me again

On one computer I am taking care of, I suddenly started getting:


$ ps2pdf fa_808.ps fa_808.pdf
/usr/bin/ps2pdfwr: line 45: exec: gs: not found

What's wrong?


$ ls /usr/bin/gs
ls: cannot access /usr/bin/gs: No such file or directory
$ wajig find-file /usr/bin/gs
ghostscript: /usr/bin/gs
$ wajig list ghostscript
ii  ghostscript                                             8.61.dfsg.1-1               The GPL Ghostscript PostScript/PDF interpret
ii  ghostscript-x                                           8.61.dfsg.1-1               The GPL Ghostscript PostScript/PDF interpret

That is really weird, the file /usr/bin/gs is simply missing, even though I have the ghostscript package installed. Ok, let's reinstall it:


$ wajig reinstall ghostscript
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libwine-capi libstdc++5 libopenal0a xgnokii libglut3 libcapi20-3
  libggi-target-x lib64gfortran2 libsvg1 lib64gomp1 libggi2 libgii1
  libgii1-target-x libxine1-gnome cups-pdf lib64objc2
Use 'apt-get autoremove' to remove them.
The following extra packages will be installed:
  akregator gs kaddressbook kaddressbook-plugins kalarm kandy kappfinder karm
  kate kcontrol kdebase-bin kdebase-data kdelibs-data kdelibs4c2a
  kdepim-kfile-plugins kdepim-kio-plugins kdepim-kresources kdepim-wizards
  kdesktop kghostview kicker kitchensync kleopatra kmplayer kmplayer-common
  kpersonalizer ksplash libarts1c2a libgnutls26 libgs8 libilmbase6 libkcal2b
  libkdepim1a libkleopatra1 libkmime2 libkonq4 libkpimidentities1 libktnef1
  libopenexr6
Suggested packages:
  kdeaddons-doc-html ntpdate ntp-simple perl-suid egroupware ffmpeg xawtv
  gnutls-bin
The following packages will be REMOVED:
  digikam kde-amusements kde-core kdeaddons kdebase kdebase-kio-plugins kdepim
  kmail kmailcvt kmplayer-plugin knights konq-plugins konqueror
  konqueror-nsplugins korn smb4k
The following NEW packages will be installed:
  gs libgnutls26 libilmbase6 libopenexr6
The following packages will be upgraded:
  akregator ghostscript kaddressbook kaddressbook-plugins kalarm kandy
  kappfinder karm kate kcontrol kdebase-bin kdebase-data kdelibs-data
  kdelibs4c2a kdepim-kfile-plugins kdepim-kio-plugins kdepim-kresources
  kdepim-wizards kdesktop kghostview kicker kitchensync kleopatra kmplayer
  kmplayer-common kpersonalizer ksplash libarts1c2a libgs8 libkcal2b
  libkdepim1a libkleopatra1 libkmime2 libkonq4 libkpimidentities1 libktnef1
36 upgraded, 4 newly installed, 16 to remove and 683 not upgraded.
Need to get 53.1MB of archives.
After this operation, 42.9MB disk space will be freed.
Do you want to continue [Y/n]? n
Abort.

Oops, unstable is broken at the moment. Ok, what now? Well, snapshot.debian.net comes to rescue again. Find "ghostcript", version "8.61.dfsg.1-1" and here we are:


$ wget http://snapshot.debian.net/archive/2008/03/02/debian/pool/main/g/ghostscript/ghostscript_8.61.dfsg.1-1_i386.deb
$ wajig install ./ghostscript_8.61.dfsg.1-1_i386.deb

And all is fine now:


$ ls /usr/bin/gs
/usr/bin/gs

Tuesday, March 25, 2008

SymPy accepts Google Summer of Code applications

SymPy is a pure Python library for symbolic mathematics. Last year SymPy had 5 excellent students and this year we are accepting students again.
Why should you apply? And why to SymPy?

Well, let me give you some reasons:

First of all, it's fun. To get some idea, read the GSoC2007 SymPy page, where you can find out what the last year students did and especially read their reports, where they describe their impressions from the summer, how they tackled problems and their overall conclusions.
It's not just about coding, we enjoy the social part too. There is a great community around numpy, scipy, ipython, matplotlib, Sage and similar tools and if you do scientific computing with Python, you gain a lot just being part of it, because you learn new things from the others.
I currently live in Prague (most people say it's a beautiful city, but I actually like Los Angeles, or the Bay Area:), if there are enough interested people, we can make a coding sprint here (plus of course some sightseeing+pubs). Anyone with a good commit history is welcome to stay at my apartment. :)
You earn $4500, some of which I suggest to spend on travelling to conferences/workshops, here are some tips: SciPy2008 (see also SciPy2007), EuroSciPy2008, Sage Days (you can read my impressions from SD6 and SD8), watch the numpy/scipy mailinglists for announcement of other meetings.

Read also the current status and motivation of SymPy and it's relation to Sage. If you want to apply, all the necessary information is on our wiki page.

Nevertheless, if you decide SymPy is not for you, but still you'd like to do GSoC project in a similar area, there are other good options too - one is SciPy/NumPy, the other is Sage. Unfortunately Sage was not accepted as a mentorship organization, but it has several good projects too, some of which you can do for example under the umbrella of the Python Software Foundation.

One of them is improving the Sage notebook. If you've never seen that - download Sage, start it (./sage), type "notebook()" and a nice Mathematica like notebook will popup in the browser. It allows collaborative editing ala Google Docs and many other things. If you'd like to work on it, reply to the email on sage-devel.

Thursday, March 6, 2008

Sage Days 8

Between February 29 and March 4, 2008 I attended the Sage Days 8, hosted at the Enthought headquarters in Austin, Texas. This was my 5th time in the USA and it was a marvelous experience, as with all my visits in the states.

As usual, I had some adventures in Atlanta, that interested readers can find at the end of this post. Anyway, on the Austin's airport I met Peter and his wife Crystal, Fernando, Benjamin, Jarrod, Eric and Clement. We went to have a dinner and then me and Clement were staying at Peter's house:

You can see the neighbor's cat and Peter's dog Trinity behind the window. The next day we went to Enthought, that was providing us with a breakfast and a lunch each day - and it was delicious. After the breakfast, we gathered in the room and introduced ourselves. Enthought rents 3/4 of the 21th floor in the Bank of America building, so when I looked left I saw:

When I looked behind I saw:

and in front of me, I saw all the participants (I took photos of all participants together with names). As you can see, there were really good people in there, like Travis (creator of NumPy), William (main author of Sage), Eric (CEO of Enthought), Fernando (author of IPython), Jarrod (the release manager of SciPy), Michael (the release manager of Sage) etc. See also the Fernando's welcome speech and the video of each of us introducting himself.

The views from the windows are terrific. I enjoyed working on each of the 4 sides of the skyscraper with completely different scenery, or when the sun is going down, that's also very cool.

We spent the whole Friday doing presentations, some of which you can find here. Then we went to Eric's house to have a big dinner together.

On Saturday, Sunday and Monday we were all hacking on many different things. I joined Fernando, Benjamin, Brian and Stefan on ipython1, Travis was implementing a new type (gmp integer) in NumPy, William wrote a manipulate command in Sage, Eric did the same in Traits, Gary and Michael implemented parallel testing of Sage, ...

On Tuesday we had final status reports and people left in the afternoon. In the evening we went with Clement to have a dinner and then we visited some bars on the 6th street, having a beer in each.

On Wednesday I visited John and Roy from the Computational Fluid Dynamics Lab at the University of Texas, Austin, who wrote the libMesh library, that I extensively used and also created a Debian package of. It was very influential to see the libMesh "from behind", also John and Roy are cool people (not mentioning the Debian tradition of having good relations with upstream:). Then I visited some professors at the same campus, after which I went into the Capitol and then I took the bus to the Barton Creek Square Mall to buy some ipods and jeans, so that I can say I have jeans from Texas. BTW, the ipod works excellent in Debian - I plugged it in and it just shows on my Gnome desktop. It's true that naively dragging mp3 files on it didn't make it play, but these instructions made it work.

On Thursday I fixed the remaining release blockers in SymPy and made a new release. In the evening, I am going to meet Aswin, he also uses SciPy and also is a friend of Kumar, who is now maintaining python-numpy and python-scipy Debian packages with me (Kumar also knows Prabhu, the author of Mayavi2 hosted at Enthought, so it's all connected).

Anyway, the whole workshop was an excellent experience for me. I learned a lot of new things and being able to speak with people who wrote tools that I use almost everyday is important. We also extensively discussed the future of all the projects (Sage, SciPy, NumPy, IPython, Cython, SymPy). See my summarizing email to the SymPy mailinglist.

Another thing, that I find very interesting is that Microsoft is financing the windows port of Sage, that will make basically anything that uses Python/Cython/C/Fortran very easy to install on windows (just a spkg package in sage). I find it really cool that MS is not only supporting but even financing a truly opensource project.

Finally the promised adventure in Atlanta: we took off the Prague airport on February 28th with a 2 hours delay (due to some paperwork as we were told by the captain). As I had 3 hours in Atlanta for the connection to Austin and I had to go through immigration, it was clear that I'll miss it. But I was not surprised, last time I was flying through Atlanta, they canceled my flight to LA completely. We arrived in Atlanta an hour and a half before my departure, then I was waiting for about an hour at immigration, it was incredibly slow. When I had around 20 min to departure, I had to ask people standing in front of me if they let me in, they were very nice and did. I was leaving immigration 10 min to my departure, then I was running to get my luggage and myself through customs and screening, it was 5 min to my departure when I ran down to the display with departure times. Then I was sprinting like hell to the terminal D to only see the clerk doing some final paperwork with all the people already boarded and the jetway door shut. After a little persuading he let me in too, fortunately there was still one seat left, so I made it. You can imagine my pleasant surprise in Austin when I discovered, that my luggage made it too, considering that I handed it to the Atlanta's airport personnel exactly 10 min prior the departure.

Tuesday, February 26, 2008

XFS is 20x slower than ext3 (with default settings)

Is XFS that bad? Well, at least with default settings, XFS on Debian seems to be blown away by ext3 completely in terms of speed. I don't mind 1.5x slowdown, maybe even 2x, but 20x is a show stopper. I am already using ext3 for any pbuilder builds, because it's a difference to wait for 30s with XFS, compared to 3s with ext3 to extract the base image. And I'll probably switch to ext3 completely, unless someone finds a way how to fix this.

I recently got burned by this when running Sage on my computer, because it compiles a lot of Python files when started for the first time. Normally it should take roughly 15s, but instead it took 6 minutes on my comp and then it triggered a so far undiscovered bug in Sage, that I reported.

Michael Abshoff, the release manager of Sage, suggested that something is FUBAR (Fucked Up Beyond Any Recognition) on my shiny Debian amd64 sid system running on Intel Core Quad, so I said no way, because I really care about this machine, as I use it for larger finite elements calculations and other stuff (like compiling huge deb packages in parallel, like paraview).

So I offered a bet, that I give him an access to this compter, he finds the problem and if it's a problem in my Debian configuration, I'll write to this blog that I am lame, while if it's a problem in Sage, he will write to his blog that he is lame. And I was smiling to myself, how good I am and that I will have some fun too reading planet.sagemath.org with the top post from Michael saying that he is lame.

But then I remembered my old struggle with cowbuilder and XFS and I stopped smiling. See e.g. this wiki I created half a year ago. Something is FUBAR with XFS and Debian. I also asked on the Czech server Root, that is famous for having a lot of experts willing to share their knowledge, and it was quickly revealed, that the problem is with the "nobarrier" option of XFS (my post is here, but it's in Czech).

First, on that amd64 machine, the above problem was fixed after issuing this command:


mount -o remount,rw,nobarrier /dev/sda3 /home/

(notice the "nobarrier" option). You can read some background behind this on the lkml list. Unfortunately, I also have my laptop, and there I already use this "nobarrier" option, and it doesn't help at all. I just created a new ext3 partition and verified that on my laptop, ext3 is around 10x faster than XFS with nobarrier (that was supposed to fix this). I use the latest 2.6.24 kernel from unstable on both.

Time to move from XFS to ext3 on my laptop? Seems like that. I'll leave XFS on the other machine, because I know some other peole have good experience with XFS and the "nobarrier" option seems to fix the problem there.

But as to the bet, yeah, I am lame and I should still learn a lot from Michael. :)

Thursday, January 3, 2008

SymPy/sympycore (pure Python) up to 5x faster than Maxima (future of Sage.calculus?)

According to this test, sympycore is from 2.5x to 5x faster than Maxima. This is an absolutely fantastic result and also a perfect certificate for Python in scientific computing. Considering that we compare pure Python to LISP.

Ok, this made us excited, so we dugg deeper and ran more benchmarks. But first, let me say a few general remarks. I want a fast CAS (Computer Algebra System) in Python. General CAS, that people use, that is useful, that is easily extensible (!), that is not missing anything, that is comparable to Mathematica and Maple -- and most importantly -- I want it now and I don't care about 30 years horizons (I won't be able to do any serious programming in 30 years anyway). All right. How to do that? Well, many people tried... And failed. The only opensource CAS system, that has any chance of becoming the opensource CAS, in my own opinion, is Sage. You can read more about my impressions form Sage here. I am actually only interested in mathematical physics, so basically Sage.calculus. Currently Sage uses Maxima, because Maxima is old, proven, working system and it's reasonably fast and quite reliable, but written in LISP. Some people like LISP. I don't and I find it extremely difficult to extend Maxima. Also even though Maxima is in LISP, it uses it's own language for interacting with the user (well, that's not the way). I like python, so I want to use Python. Sage has written Python wrappers to Maxima, so Sage can do almost everything that Maxima can, plus many other things. Now. But the Sage.calculus has issues.

First, I don't know how to extend the wrappers with some new things, see my post in the sage-devel for details, it's almost 2 months old with no reaction, which shows that it's a difficult issue (or nonsense:)).

And second, it's slow. For some examples that Sage users have found out, even SymPy, as it is now, is 7x faster than Sage and sympycore 23x faster and with the recent speed improvements 40x faster than Sage.

So let's improve Sage.calculus. How? Well, no one knows for sure, but
I believe in my original idea of pure Python CAS (SymPy), possibly with some parts rewritten in C. Fortunately, quite a lot of us believe that this is the way.

What is this sympycore thing? In sympy, we wanted to have something now, instead of tomorrow, so we were adding a lot of features, not looking too much on speed. But then Pearu Peterson came and said, guys, we need speed too. So he rewrote the core (resulting in 10x to 100x speedup) and we moved to the new core. But first, the speed isn't sufficient, and second it destabilized SymPy a lot (there are still some problems with caching and assumptions half a year later). So with the next package of speed improvements, we decided to either port them to the current sympy, or wait until the new core stabilizes enough. So the new new core is called sympycore now, currently it only has the very basic arithmetics (and derivatives and simple integrals), but it's very fast. It's mainly done by Pearu. But for example the latest speed improvement using sexpressions was invented by Fredrik Johansson, another SymPy developer and the author of mpmath.

OK, let's go back to the benchmarks. First thing we realized is that Pearu was using CLISP 2.41 (2006-10-13) and compiled Maxima by hand in the above timings, but when I tried Maxima in Debian (which is compiled with GNU Common Lisp (GCL) GCL 2.6.8), I got different results, Maxima did beat sympycore.

SymPyCore:


In [5]: %time e=((x+y+z)**100).expand()
CPU times: user 0.57 s, sys: 0.00 s, total: 0.57 s
Wall time: 0.57

In [6]: %time e=((x+y+z)**20 * (y+x)**19).expand()
CPU times: user 0.25 s, sys: 0.00 s, total: 0.25 s
Wall time: 0.25

Maxima:


(%i7) t0:elapsed_real_time ()$ expand ((x+y+z)^100)$ elapsed_real_time ()-t0;
(%o9)                                0.41
(%i16) t0:elapsed_real_time ()$ expand ((x + y+z)^20*(x+z)^19)$ elapsed_real_time ()-t0;
(%o18)                         0.080000000000005

So when expanding, Maxima is comparable to sympycore (0.41 vs 0.57), but for general arithmetics, Maxima is 3.5x faster. We also compared GiNaC (resp. swiginac):


>>> %time e=((x+y+z)**20 * (y+x)**19).expand()
CPU times: user 0.03 s, sys: 0.00 s, total: 0.03 s
Wall time: 0.03

Then we compared just the (x+y+z)**200:


sympycore:
>>> %time e=((x+y+z)**200).expand()
CPU times: user 1.80 s, sys: 0.06 s, total: 1.86 s
Wall time: 1.92
swiginac:
>>> %time e=((x+y+z)**200).expand()
CPU times: user 0.52 s, sys: 0.02 s, total: 0.53 s
maxima:
(%i41) t0:elapsed_real_time ()$ expand ((x + y+z)^200)$ elapsed_real_time ()-t0;
(%o43)                         2.220000000000027

Where GiNaC still wins, but sympycore beats Maxima, but the timings really depend on the algorithm used, sympycore uses Millers algorithm which is the most efficient.

So then we tried a fair comparison: compare expanding x * y where x and y are expanded powers (to make more terms):


 sympycore:
 >>> from sympy import *
 >>> x,y,z=map(Symbol,'xyz')
 >>> xx=((x+y+z)**20).expand()
 >>> yy=((x+y+z)**21).expand()
 >>> %time e=(xx*yy).expand()
 CPU times: user 2.21 s, sys: 0.10 s, total: 2.32 s
 Wall time: 2.31
 swiginac:
 >>> xx=((x+y+z)**20).expand()
 >>> yy=((x+y+z)**21).expand()
 >>> %time e=(xx*yy).expand()
 CPU times: user 0.30 s, sys: 0.00 s, total: 0.30 s
 Wall time: 0.30
 maxima:
 (%i44) xx:expand((x+y+z)^20)$
 (%i45) yy:expand((x+y+z)^21)$
 (%i46) t0:elapsed_real_time ()$ expand (xx*yy)$ elapsed_real_time ()-t0;
 (%o48)                         0.57999999999993

So, sympycore is 7x slower than swiginac and 3x slower than maxima. We are still using pure Python, so that's very promising.

When using sexpr functions directly then 3*(a*x+..) is 4-5x faster than Maxima in Debian/Ubuntu. So, the headline of this post is justified. :)

Conclusion

Let's build the car. Sage has the most features and it is the most complete car. It has issues, some wheels need to be improved (Sage.calculus). Let's change them then. Maybe SymPy could be the new wheel, maybe not, we'll see. SymPy is quite a reasonable car for calculus (it has plotting, it has exports to latex, nice, simple but powerfull command line with ipython and all those bells and whistles and it can also be used as a regular python library). But it also has issues, one wheel should be improved. That's the sympycore project.

All those smaller and smaller wheels show, that this is indeed the way to go, but very important thing is to put them back in the car. I.e. sympycore back to sympy and sympy back to Sage and integrate them well. While also leaving them as separate modules, so that users, that only need one particular wheel, can use them.

Tuesday, January 1, 2008

R.<a,b,c> = QQ[] what is that?

While playing with Jaap's wish, I finally got fedup and decided to study what the funny syntax "R.<a,b,c> = QQ[]" really means. So I did:


sage: R. = QQ[]
sage: preparse("R. = QQ[]")
"R = QQ['a, b, c']; (a, b, c,) = R._first_ngens(Integer(3))"

Now everything is crystal clear! Let's study what QQ is:


sage: QQ?   
Type:  RationalField
Base Class: 
String Form: Rational Field
Namespace: Interactive
Docstring:
    
        The class class{RationalField} represents the field Q of
        rational numbers.

Cool, why not. So what is the _first_ngens method doing? Let's find out:


sage: R._first_ngens?
Type:  builtin_function_or_method
Base Class: 
String Form: 
Namespace: Interactive

Hm, not really useful. Let's push harder:


sage: R._first_ngens??
Type:  builtin_function_or_method
Base Class: 
String Form: 
Namespace: Interactive
Source:
    def _first_ngens(self, n):
        v = self.gens()
        return v[:n]

So the equivalent code is this:


sage: R.gens()
(a, b, c)
sage: R.gens()[:3]
(a, b, c)

Cool, why not. So what is the R.gens() doing?


sage: R.gens? 
Type:  builtin_function_or_method
Base Class: 
String Form: 
Namespace: Interactive
Docstring:
    
            Return the tuple of variables in self.
    
            EXAMPLES:
                sage: P. = QQ[]
                sage: P.gens()
                (x, y, z)
    
                sage: P = MPolynomialRing(QQ,10,'x')
                sage: P.gens()
                (x0, x1, x2, x3, x4, x5, x6, x7, x8, x9)
    
                sage: P. = MPolynomialRing(QQ,2) # weird names
                sage: P.gens()
                (SAGE, SINGULAR)

Ah, that's the answer we want. :)

Ondřej Čertík