Snapshot test case (failed)

dreamcompute

#1

Hello!

I’ve booted an instance.

After some modification, I’ve tried to do a snapshot.

The job apparently hung. And after launching snapshot job, instance also become unavailable (connection lost and stop pinging).

Thanks!!


#2

That’s unexpected! We’ll take a look and try to reproduce.


#3

Hi Justin!

I’m repeating the case now:

  1. I’ve created an instance named snap-test-case

  2. I’ve associated a public IP. I’ve started to ping to it from my laptop.

  3. I’ve launched a snapshot named snap-test-case-snap1.

  4. After that, ping timeouts (instance not visible to internet). And after five minutes, snapshot creation is still “queued” and instance remains not visible.

  5. Instance status is “Image Pending Upload”.

I’m leaving things here in order you can debug them. Feel free to delete instances after fixing it. I’m currently do not have any important data.

I’m sending you additional data by email.


#4

Update:

I have repeated test, I’ve observed that creating a snapshot really takes too much time (around 5-10 minutes). Instance apparently freezes (ping timeouts) and status is “Image Pending Upload”. After 5-10 minutes, snapshot is ok and instance unfreezes.

Best regards.

Juanjo


#5

Thanks for the updated information! Still digging into this.


#6

My apologies for the delay in getting back to you on this. It turns out that it is expected behavior in OpenStack for an instance to be unreachable during a snapshot. In fact, the documentation suggests shutting off your instance before doing a snapshot - http://docs.openstack.org/user-guide/content/cli_migrate_instances.html

While that was initially a surprise to me, I suppose it makes sense. If you were to launch an instance from that snapshot later it may not run properly.


#7

I guess these are different problems: of course if I launch a snapshot without freezing the filesystem (via fsfreeze or xfs_freeze or so) the snapshot may be unconsistent and require a fsck (and it may loss data). So, this is the cause for shutdown recommendation. But, snapshot process should be, in any case, “asynchronous”, that is, you launch it and in less than a second instance should be available again and snapshot should be available later, when possible (may be queued state).

This is the behaviour, for example in AWS. I understand the problem may be related with underlying virtualization used and/or disk management software.

According this, “live snapshots allows you to create new images from running instances […] These snapshots are simply disk-only snapshots. Snapshotting an instance can now be performed with no downtime (assuming QEMU 1.3+ and libvirt 1.0+ are used).”

http://docs.openstack.org/openstack-ops/content/snapshots.html

Thanks!