Tuesday, February 26, 2008

XFS is 20x slower than ext3 (with default settings)

Is XFS that bad? Well, at least with default settings, XFS on Debian seems to be blown away by ext3 completely in terms of speed. I don't mind 1.5x slowdown, maybe even 2x, but 20x is a show stopper. I am already using ext3 for any pbuilder builds, because it's a difference to wait for 30s with XFS, compared to 3s with ext3 to extract the base image. And I'll probably switch to ext3 completely, unless someone finds a way how to fix this.

I recently got burned by this when running Sage on my computer, because it compiles a lot of Python files when started for the first time. Normally it should take roughly 15s, but instead it took 6 minutes on my comp and then it triggered a so far undiscovered bug in Sage, that I reported.

Michael Abshoff, the release manager of Sage, suggested that something is FUBAR (Fucked Up Beyond Any Recognition) on my shiny Debian amd64 sid system running on Intel Core Quad, so I said no way, because I really care about this machine, as I use it for larger finite elements calculations and other stuff (like compiling huge deb packages in parallel, like paraview).

So I offered a bet, that I give him an access to this compter, he finds the problem and if it's a problem in my Debian configuration, I'll write to this blog that I am lame, while if it's a problem in Sage, he will write to his blog that he is lame. And I was smiling to myself, how good I am and that I will have some fun too reading planet.sagemath.org with the top post from Michael saying that he is lame.

But then I remembered my old struggle with cowbuilder and XFS and I stopped smiling. See e.g. this wiki I created half a year ago. Something is FUBAR with XFS and Debian. I also asked on the Czech server Root, that is famous for having a lot of experts willing to share their knowledge, and it was quickly revealed, that the problem is with the "nobarrier" option of XFS (my post is here, but it's in Czech).

First, on that amd64 machine, the above problem was fixed after issuing this command:

mount -o remount,rw,nobarrier /dev/sda3 /home/

(notice the "nobarrier" option). You can read some background behind this on the lkml list. Unfortunately, I also have my laptop, and there I already use this "nobarrier" option, and it doesn't help at all. I just created a new ext3 partition and verified that on my laptop, ext3 is around 10x faster than XFS with nobarrier (that was supposed to fix this). I use the latest 2.6.24 kernel from unstable on both.

Time to move from XFS to ext3 on my laptop? Seems like that. I'll leave XFS on the other machine, because I know some other peole have good experience with XFS and the "nobarrier" option seems to fix the problem there.

But as to the bet, yeah, I am lame and I should still learn a lot from Michael. :)

8 comments:

Niv said...

Default XFS mkfs options were left behind for a while and it has now been fixed. What version of xfsprogs are you runing (at mkfs time ?).

XFS will be slower than ext* on SOME workloads (even tho your 20x claim seems busted), and much faster on some others.

Activating nobarriers on laptops sata drive is call for loss of data !

run xfs_info and check that you have:
* v2 inode.
* v2 log.
* v2 attr.
* 4 AGs.
* lazy superblock counters (this one should really speed things up).

Ondřej Čertík said...

Using snapshot.debian.net I think I was running either xfsprogs 2.8.18-1 or 2.9.0-1 on my laptop, not really sure now.

I reactivated barriers again on my laptop and also moved the base system (except /home) to ext3 and pbuilder is a lot faster now.

20x is not an exaggeration, see this wiki
for timings.

Here is the output of:
$ xfs_info /dev/sda3
meta-data=/dev/sda3 isize=256 agcount=16, agsize=2182580 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=34921280, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=17051, version=1
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0


Honestly, I am kind of bored fiddling with filesystem parameters - either it works by default, or it doesn't. ext3 seems to work for me.

Ondřej Čertík said...

I got a private email from Dave, because blogger makes very difficult to post comments. With his permission, reposting:

----------------------------------

Hi,

I could not post to ondrej since I've no account at blogger.

Whatever slows down is probably zilch related to XFS. But try

async
noatime
nodiratime

which are generic across any Linux filesystem. For XFS we use this
fstab line,

UUID=xxxxxxxxxxx...x /mount point/ xfs
noatime,nodiratime,async,barrier,rw,dev,suid,exec,auto 0 1

You can
/lib/udev/vol_id --uuid /dev/hdX
for uuid's

You might also play with hdparm for hardware. I suspect your
compilation rig. Too much disk access by compilers is a design flaw.
Try running from tmpfs then, like

tmpfs /usr/local/myproject tmpfs
rw,nosuid,noexec,noatime,size=100M,nr_inodes=8k 0 0

and rsync to/from permanent storage.

I will never return my users to ext2/3 from XFS. Ext3 ate data numerous
times with no excuses like power failure. XFS has robustly survivied
all kinds of power brown outs and black outs. We've never lost data,
and I'm very happy. Performance is great. We use the lazy superblock
option but were happy even before that, without lazy superblocks.

Notice that ext4 is now attempting to implement extents, which XFS has
used for ages.

http://www.linux.com/feature/141404?theme=print
http://everything2.com/index.pl?node_id=1479435

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.