Since this upgrade (which updated the gluster packages, but also the Ubuntu kernel package), kvm live migration fails in a most unusual manner. The live migration itself succeeds, but on the receiving machine, the vm-storage for that machine becomes inaccessible. Which in turn causes the guest OS to no longer be able to read or write its filesystem, with of course fairly disastrous consequences for such a guest.
So before a migration, everything is running smoothly. The two cluster nodes are 'cl0' and 'cl1', and we do the migration like this:
The migration itself works, but as soon as you do the migration, the
file
/gluster/guest.raw (which holds the filesystem for the guest)
becomes completely inaccessible: trying to read it (e.g. with dd or
md5sum) results in a 'permission denied' on the destination cluster
node, whereas the file is still perfectly fine on the machine that
the migration originated from.
As soon as the guest is stopped, (virsh destroy), the file /gluster/guest.raw becomes readable again and can then be started again on either server without further issues. It does not affect any of the other files in /gluster/.
The problem seems to be in the gluster or fuse part, because once this error condition is triggered, the /gluster/guest.raw cannot be read by any application on the destination server. This situation is 100% reproducible, every attempted live migration fails in this way.
Some more details of the current setup, and logfiles:
Kernel: Ubuntu 3.8.0-35-generic (13.10, Raring)
Glusterfs: 3.4.1-ubuntu1~raring1
qemu: 1.4.0+dfsg-1expubuntu4
libvirt0: 1.0.2-0ubuntu11.13.04.4
Mountpoints:
Output from gluster volume info all:
Below are the logfiles, recorded at 'DEBUG' level while trying to migrate the guest 'kvmtest' from cl0 to cl1. The migration itself happens at 14:00:00.
gluster.log
glustershd.log
etc-glusterfs-glusterd.vol.log
export-brick0-sdb1.log
kvmtest.log (from /var/log/libvirt/qemu)