Sunday, September 2, 2012

How to make pdflatex accept .eps images

Unfortunately, pdflatex does not support .eps images, for example the following code fails:
\usepackage{graphicx}
...
\includegraphics{qc.eps}
The fix is to put the following lines at the top of the document:
\usepackage{epstopdf}
\epstopdfsetup{suffix=}
\DeclareGraphicsRule{.eps}{pdf}{.pdf}{`epstopdf #1}
and compile with the -shell-escape flag:
pdflatex -shell-escape my_file.tex
Then the above code works. You need the epstopdf program that is run behind the scene.

Update: Apparently, it's enough to just include the following line at the top of the document:
\usepackage{epstopdf}
One still has to compile with the -shell-escape flag. This works for me in Ubuntu 12.04.

Thanks to: http://chi3x10.wordpress.com/2009/06/18/eps-and-pdflatex-no-more-converting-eps-to-pdf/

Monday, June 4, 2012

How to convert scanned images to pdf

From time to time I need to convert scanned documents to a pdf format.


Usage scenario 1: I scan part of a book (i.e. some article) on a school's scanner that sends me 10 big separate color pdf files (one pdf per page). I want to get one nice, small (black and white) pdf file with all the pages.


Usage scenario 2: I download a web form, print it, fill it in, sign it, scan it on my own scanner using Gimp and now I want to convert the image into a nice pdf file (either color or black & white) to send back over email.

Solution: I save the original files (be it pdf or png) into a folder and use git to track it. Then create a simple reduce script to convert it to the final format (view it as a pipeline). Often I need to tweak one or two parameters in the pipeline.

Here is a script for scenario 1:

And here for scenario 2:

There can be several unexpected surprises along the way. From my experience:

  • If I convert png directly to tiff, sometimes the resolution can be wrong. The solution is to always convert to ppm (color) or pbm (black and white) first, which is just a simple file format containing the raw pixels. This is the "starting" format (so first I need to convert the initial pdf or png into ppm/pbm) and then do anything else. That proved to be very robust.
  • The tiff2pdf utility proved to be the most robust way to convert an image to a pdf. All other ways that I have tried failed in one way or another (resolution, positioning, paper format and other things were wrong....). It can create multiple pages pdf files, set paper format (US Letter, A4, ...) and so on.
  • The linux convert utility is a robust tool for cropping images, converting color to black and white (using a threshold for example) and other things. As long as the image is first converted to ppm/pbm. In principle it can also produce pdf files, but that didn't work well for me.
  • I sometimes use the unpaper program in the pipeline for some automatic polishing of the images.
In general, I am happy with my solution. So far I was always able to get what I needed using this "pipeline" method.

Sunday, January 29, 2012

Discussion about global warming

I have read the text The Truth About Greenhouse Gases by William Happer, a physicist at Princeton. I liked it, so I posted it to my Google+. I was surprised by so many emotional responses. The post also made Michael Tobis to write a blog post with his opinion.

I was not satisfied about the overall tone of the discussion. I am really just interested in factual arguments. As such, I browsed through all the arguments against William's paper from the above discussion, and chose one well formulated question, that I think represents the most important objection: "In your paper you state at several places, that doubling the CO2 concentrations will only increase the temperature by 1 C. However, it is claimed (see for example Michael's post above) that the increase will be around 2.5 C. Which number is correct and where is it coming from?" I wrote to William and asked whether he would be willing to answer it. With his permission I am posting his answer here.

Answer: sensitivity.pdf
Images referenced in the answer: image001.png, image002.png.
Link referenced in the answer: http://www.thegwpf.org/best-of-blogs/4247-steve-mcintyre-closing-thoughts-on-best.html

Edit: I was told that Blogger makes it really hard to comment under the article. You can discuss it at my G+ post about this: https://plus.google.com/u/0/104039945248245758823/posts/PJeqx7GKtLg

Thursday, January 26, 2012

When double precision is not enough

I was doing some finite element (FE) calculation and I needed the sum of the lowest 7 eigenvalues of a symmetric matrix (that comes from the FE assembly) to converge to at least 1e-8 accuracy (so that I can check calculation done by some other solver of mine, that calculates the same but doesn't use FE). In reality I wanted the rounded value to 8 decimal digits to be correct, so I really needed 1e-9 accuracy (but it's ok if it is let's say 2e-9, but not ok if it is 9e-9). With my FE solver, I couldn't get it to converge more than to roughly 5e-7 no matter how hard I tried. Now what?

When doing the convergence, I take a good mesh and keep increasing "p" (the polynomial order) until it converges. For my particular problem, it is fully converged for about p=25 (the solver supports the order up to 64). Increasing "p" further will not increase the accuracy anymore, and the accuracy stays at the level 5e-7 for the sum of the lowest 7 eigenvalues. For optimal meshes, it converges at p=25, for not optimal meshes, it converges for higher "p", but in all cases, it doesn't get below 5e-7.

I know from experience, that for simpler problems, the FE solver can easily converge to 1e-10 or more using double precision. So I know it is doable, now the question is what the problem is: there
are a few possible reasons:

  • The FE quadrature is not accurate enough
  • The condition number of the matrix is high, thus LAPACK doesn't return very accurate eigenvalues
  • Bug in the assembly/solver (like single/double corruption in Fortran, or some other subtle bug)
When using the same solver for simpler potential, it converged nicely to 1e-10. So this suggests there is no bug in the assembly or solver itself. It is possible that the quadrature is not accurate enough, but again, if it converges for simple problem, it's probably not it. So it seems it is the ill conditioned matrix, that causes this. So I printed the residuals (that I simply calculated in Fortran using the matrix and the eigenvectors returned by LAPACK), and it only showed 1e-9. For simpler problems, it can go to 1e-14 easily. So that must be it. How do we fix it?

Obviously by making the matrix less ill conditioned, which is caused by the mesh for the problem (the ratio of the longest/shortest elements is 1e9) but for my problem I really needed such a mesh. So the other option is to increase the real number accuracy.

In Fortran all real variables are defined as real(dp), where dp is an integer defined at a single place in the project. There are several ways to define it, but it's value is 8 for gfortran and it means double precision. So I increased it to 16 (quadruple precision), recompiled. Now the whole program calculates in quadruple precision (more than 30 significant digits). I had to recompile LAPACK using the "-fdefault-real-8" gfortran option, that promotes all double precision numbers to quadruple precision, and I used the "d" versions (double precision, now promoted to quadruple) of LAPACK routines.

I rerun the calculation ---- and suddenly LAPACK residuals are around 1e-13, and the solver converges to 1e-10 easily (for the sum of the lowest 7 eigenvalues). Problem solved.

Turning my Fortran program to quadruple precision is as easy as changing one variable and recompiling. Turning LAPACK to quadruple precision is easy with a single gfortran flag (LAPACK uses the old f77 syntax for double precision, if it used real(dp), then I would simply change it as for my program). The whole calculation got at least 10x slower with quadruple. The reason is that gfortran runtime uses the libquadmath library, that simulates quadruple precision (as current CPUs only support double precision natively).

I actually discovered a few bugs in my program (typically some constants in older code didn't use the "dp" syntax, but had the double precision hardwired). Fortran warns about all such cases, when the real variables have incompatible precision.

It is amazing how easy it is to work with different precision in Fortran (literally just one change and recompile). How could this be done with C++? This wikipedia page suggests, that "long double" is only 80bit in most cases (quadruple is 128bit), but gcc offers __float128, so it seems I would have to manually change all "double" to "__float128" in the whole C++ program (this could be done with a single "sed" command).

Thursday, November 18, 2010

Google Code vs GitHub for hosting opensource projects

Cython is now considering options where to move the main (mercurial) repository, and Robert Bradshaw (one of the main Cython developers) has asked me about my experience with regards to Google Code and GitHub, since we use both with SymPy.

Google Code is older, and it was the first service that provided free (virtually unlimited) number of projects that you could easily and immediately setup. At that time (4 years ago?) that was something unheard of. However, the GitHub guys in the meantime not only made this available too, but also implemented features, that (as far as I know) no one offers at all, in particular hosting your own pages at your own domain (but at GitHub's servers, some examples are sympy.org and docs.sympy.org), commenting on git branches and pull requests before the code gets merged in (I am 100% convinced that this is the right approach, as opposed to comment on the code after it gets in), allow to easily fork the repository and it has simply more social features, that the Google Code doesn't have.

I believe that managing an opensource project is mainly a social activity, and GitHub's social features really make so many things easier. From this point of view, GitHub is clearly the best choice today.

I think there is only one (but potentially big) problem with GitHub, that its issue tracker is very bad, compared to the Google Code one. For that reason (and also because we already use it), we keep our issues at Google Code with SymPy.

The above are the main things to consider. Now there are some little things to keep in mind, that I will briefly touch below: Google Code doesn't support git and blocks access from Cuba and other countries, when you want to change the front page, you need to be an admin, while at GitHub I simply add push access to all sympy developers, so anyone just pushes a patch to this repository: https://github.com/sympy/sympy.github.com, and it automatically appears on our front page (sympy.org), with Google Code we had to write long pages (in our docs) about how to send patches, with GitHub we just say, send us a pull request, and point to: http://help.github.com/pull-requests/. In other words, GitHub takes care of teaching people how to use git and figure out how to send patches, and we can concentrate on reviewing the patches and pushing them in.

Wikipages at github are maintained in git, and they provide the webfrontend to it as opensource, so there is no vendor lock-in. Anyone with github account can modify our wiki pages, while the Google Code pages can only be modified by people that I add to the Google Code project, which forced us to install mediawiki on my linode server (hosted at linode.com, which by the way is an excellent VPS hosting service, that I have been using for couple of years already and I can fully recommend it), and I had to manage it all the time, and now we are moving our pages to the github wiki, so that I have one less thing to worry about.

So as you can see, I, as admin, have less things to worry about, as github manages everything for me now, while with Google Code, I had to manage lots of things on my linodes.

One other thing to consider is that GitHub is only for git, but they also provide svn and hg access (both push and pull, they translate the repository automatically between git and svn/hg), I never really used it much, so I don't know how stable this is. As I wrote before, I think that git is the best tool now for maintaining a project, and I think that github is now the best choice to host it (except the issue tracker, where Google Code is better).

Sunday, October 31, 2010

git has won

I switched to git from mercurial about two years ago. See here why I switched and here my experience after 4 months. Back then I was unsure, whether git will win, but I thought it has a bigger momentum. Well, I think that now it's quite clear that git has already won. Pretty much everybody that I collaborate with is using git now.

I use github everyday, and now thanks to github pull requests, I think it's the best collaboration platform out there (compared to Google Code, Sourceforge, Bitbucket or Launchpad).

I think it's partly because the github guys have a clear vision of what has to be done in order to make collaboration more easier and they do it, but more importantly that git branches is the way to go, as well as other git features, that are "right" from the beginning (branches, interactive rebase, and so on), while other VCS like bzr and mercurial simply either don't have them, or are getting them, but it's hard to get used to it (for example mercurial uses the "mercurial queues", and I think that is the totally wrong approach to things).

Anyway, this is just my own personal opinion. I'll be happy to discuss it in the comments, if you disagree.

Friday, August 13, 2010

Week Aug 9 - 13

On Monday I learned fwrap (excellent piece of software btw), there were a few minor technical issues, that I communicated with Kurt on the fwrap mailinglist and I also send him a simple patch, so that it works fine for my fortran code (functions returning the value itself, instead of a tuple of length 1).

Then I took my old fortran based shooting method solvers that I wrote couple years ago and wrapped them using fwrap and run couple simulations against my FE solver.

On Tuesday we had a lunch with all the advisors and students and the llnl director and a little presentation about what we did.

On Wednesday I run shooting method calculations for 50 states of silver, both for selfconsistent DFT potential and Z/r potential. I then also run the FE solver for the same DFT potential and compared results. There are lots of small technical issues, for example I had to use cubic splines to interpolate the potential, play with the mesh for the shooting method and so on.

However, the shooting method and FE agrees to every single printed digit, after making sure that the mesh is ok for both methods. For all potentials that I tried. That's very cool.

In the process of it, I also wrote a patch to SymPy to calculate exact energies for the Hydrogen atom, both from Schroedinger and Dirac equations. I still need to polish it a bit.

On Thursday I run couple more calculations and setup a poster and had a poster session, it was two hours, and I think around 7 people (not counting other students and people from our group) stopped by and talked with me about it, so I was very happy. Being able to solve radial Schroedinger and especially Dirac equations robustly is something that several people in the lab would really need.

Today I talked little bit (finally) about some Green functions in QM and QFT with a postdoc in the Quantum Simulations group, that I always wanted to, but didn't have time before, then packed my things and went back to Reno.

My plan for the next week(s) is to wrap up what I did and put it into articles. I already have enough material for some articles, so it has to be done. In parallel, I'd like to finish the FE Dirac solver, the coding is done, but now I need to play with adaptivity and also investigate if we are getting the spurious states, that other people are getting when using b-splines.