Sunday, May 10, 2009

My experience with running an opensource project

Nir Aides, the author of the excellent winpdb debugger, sent me the following email on September 21, 2008, so I asked him if I can copy his email and reply in form of a blog post (so that other people can comment and join the discussion) and he agreed. It took me almost a year to reply, but I made it. :)

Hi Ondrej,

How are you?

I am about to publish a new free software project, a new simple PHP framework, and I am interested in your advice.

You started SymPy and were able to make other people join you and develop it with you.
How did you do it?
How did it happen?
Did you actively call for other people or they spontaneously showed interest and joined you?
Are the other major contributor people who were your friends before you started the project?
Did you need to create or manage the project in a particular way to make it attractive to other people?
Are there things you are aware of that promote collaboration or demote it?

I was never successful in doing the same with Winpdb, which while it became reasonably popular, no one has ever joined me to develop it, except for a notable tutorial contribution by Chris Lasher which was developed independently.

Now with the new project, I am wondering what are my chances of making other people try it and take it on. On the one hand it is a new and fresh code base in an interesting field, on the other hand, why would anyone bother to spend their energy on this new project when they have Symfony or Drupal?

What do you think?

BTW, Ohloh believes you have a median of 19,000 lines of changed code per month since the start of their log. Can this be true? Is this humanly possible? According to it SymPy has over 1,000,000 lines of code? I can't understand these numbers. Winpdb has about 25,000 lines after 3 years of development. And from my experience 1,000,000 lines of code projects need about 20-50 full time developers to work on for 2-5 years which is about 40-250 man years. And as if this is not enough you are listed as owner in a dozen other projects in Google code and have enough time to become an awarded scientist. How is this possible?

http://www.ohloh.net/p/sympy/contributors/

BTW2, do you still use Winpdb? If you find yourself using it less, can you say what are the reasons, or what it would take to make it more useful?

BTW3, How is SymPy doing?

Cheers,
Nir



So my most honest answer how to run a successful opensource project is: I don't know.

But nevertheless I tried to summarize some of my ideas and experience and some guidelines that I try to follow, maybe it will be useful to you Nir, or anyone else.

First of all, there has to be a public mailinglist (easily accessible), public bug tracker, nice webpage, easy to find downloads, frequent releases (once a month is good, but in the worst case at least 4 times a year) and a set of guidelines to follow in order to contribute. So that's a must, if the project doesn't have the above, it's almost impossible to become successful. However, that is just a start, just a playground. There are still many projects that have the above and yet they totally fail to attract developers.

So I think the most important principle is that I always think how to employ other people in what I do. If I have some plan in my head how to do something, e.g. how to move some things forward, I always create exact steps and put it to issues, or our mailinglist, so that each step can be done by someone who is completely new to sympy. So I try to look at things from other people's perspective and think -- ok, I quite like this SymPy project and I'd like to get this done (for example a new release, or something fixed, or implemented), but I have no idea how to start and what exactly needs to be done.

So what I try to do if someone comes to our list and asks for something, is that I create a new issue for it and think how I would fix it if I had time. Then write the necessary steps in the issue and invite the submitter to fix it and I offer help with explaining anything and guiding. Now there are two things that can happen. Either the submitter has time and a will to go forward and in this case he starts wrestling with it and whenever he has some code or a question, I need to find time, review it and offer some way out. Or the submitter is too busy, in which case the instructions simply rest in the issues and the next time someone asks for the feature, the instructions are already there. I don't have estimates how frequent either case is.

When I am working on something myself, I try not to code privately, but also put up issues first and put the steps needed in the issues, so that it's easy for other people to join in.

In general, the most precious value for me is the fact that someone else had to sit down at his computer and wrote the patch. So I do everything possible to get new (or more) people interested in the development. Some people think that only super programmers can do a decent job and it's useless to invest time in people that may just have started with Python. They are wrong. Among the SymPy developers (around 65 people total have contributed patches so far at the time of writing this post), we have all kinds of people. We have people from high school, we have a retired US army engineer, we have physicists, mathematicians, biologists, engineers, teachers, or just hobbyists, who do it for fun. Unfortunately, we do not have many women (I think no patch that made it into sympy was contributed by a woman, but I may be wrong), so if anyone has any ideas how to get more women involved, let me know (I know we have several women fans, so that's a good start:). We have people whose first open source project they ever contributed to was sympy and people who are new to Python.

Many times the first patch that a new potential developer submits is not perfect, usually it's faster for me to write it myself, than to help with the first patch, however my rule is to always help the submitter do that. Sometimes he sends a second patch, or a third, and usually it needs less and less work on my side and it already pays off, because he is then able to fix things himself, if he discovers a bug and sympy has just won a one more contributor.

So I came to the conclusion that all that is needed is an enthusiasm. You don't even have to know Python (as you can learn all these things on the way) and you can still do useful things for us and really spare our time.

To answer another question from Nir's email, SymPy has about 130000 lines of code and another about 20000 lines of tests, so I think those stats are wrong. Also the changed lines of code is in my opinion wrong, we usually have about 250 new patches per release (this depends how often we release and other things).

Yes, I am involved in couple other projects, e.g. Debian, Sage, ipython, scipy, hpfem.org (and couple more), basically everything that has to do with numeric simulation and Python, but my activity there varies. The most time consuming thing in the last couple years was definitely school, I was finishing my master in Theoretical Physics in Prague and then moved to the Nevada/Reno and I just finished my first semester here at PhD in Chemical Physics, and sometimes it was just crazy, e.g. I finished teaching at 7pm and instead of going home and sleep, I stayed in my office, fixed 10 sympy issues that were holding off a release, finished at 1am, went home (by bike, since I don't have a car yet), slept couple hours and then did just school again for a week, other people reviewed the issues in the meantime, and then I made the release (instead of sleeping again). In the last semester it was not unusual that I got home at 1am every week day, then slept most of Saturday to catch up, on Sunday I did some laundry and shopping, and the rest of time I did grading and homeworks for all my classes and teaching, no time for anything else (e.g. no friends, no girls, no rest, no hobby, no opensource stuff, nothing). So sometimes one has to work pretty hard to get through it, but fortunately it's behind me finally, if all goes well, I should be just doing research from now on and have a real life too. Also I am sorry I didn't manage to reply sooner. :)

To answer the other questions:
Are the other major contributor people who were your friends before you started the project?
No, not a single major contributor was my friend before I started the project. Every single one of them become a developer using the procedure I described above, e.g. first showed on the list or in the issues, and maybe even the very first patch was not a high quality one (and if I was stupid and arrogant, or didn't see the big potential, I would just ignore them). But when given a chance, they became extremely good developers and sympy would simply just not be here without them.

Did you actively call for other people or they spontaneously showed interest and joined you?
I very much encourage everyone to contribute, but the initial interest must be in them, e.g. they at least have to show around the mailinglist/issues, so that I know about them. But once I know they are interested in some issue, yes, I try to invite them to fix it, with my help.

One observation I made is that I have to always think in the spirit "how to earn new money, not how to spare the money I already have", e.g. when applied to sympy, how to get new developers, how to develop the new great things etc. Even if I am super busy as I was, I still have to think this way. Once I start thinking how to conserve and preserve what we already have, I am done, finished and that's the road to hell.

If I am open, positive, full of energy, I can see people joining me and we can do great things together. It probably sounds obvious, but it was not for me, when for example some people I worked with, started their own projects, when I got busy, and started to compete, instead of helping sympy out. And I felt betrayed, after so much work that I invested into it and started to become protective. And then I realised that's wrong. I can never stop other people do what they want to do. If they want to have their own project, they will have it. If they don't want to help sympy out, they won't (and what is more important, there is nothing wrong with either of that). It's that simple and being protective only makes things worse.

There is also a question of the license that you use for the project, e.g. one should basically only choose between BSD (maybe also MIT or Apache), LGPL and GPL (there are also several versions of the GPL licenses). Unfortunately the fact is, that there are people who will never contribute a code under a permissive BSD license (because it's not protecting their work enough) and there are also other people who really want to code to be BSD (or other permissive license) so they can sell it and they don't need to consult with lawyers what they are or aren't allowed to do and also so that they can combine it with any other code (opensource or not). It also depends if one wants to combine (and distribute) other codes together. So choosing a license is also important. I believe that for sympy BSD is the best and for other projects (like Sage) GPL is the best and one has to decide on a case by case basis. For Winpdb, I would make it BSD too, since you can get more people using it.

To conclude, SymPy is a little more than 2 years old, and it has been a great ride so far and more things are coming, e.g. this summer we have 5 Google Summer of Code students and people are starting it to use in their research and we plan to use it in our codes at our group here in Reno too, so things look promising. I am really glad, we managed to build such a community, so that when I am busy, as I was the last semester, other people help out with patches, reviews and other things, so that the project doesn't stall and when I got rid of my school duties now, we can move things forward a lot.

So maybe you can get inspired by some of the ideas above. I am also interested in any discussion about this (feel free to post a comment below, or send me an email, or just write to a sympy list about what you think).