2010-07-08

Possibly, the end of the snapshots story.

The way 'Delete all' snapshots does the job has changed with some recent patches.


Which patches change this feature?




What are the changes?

Before these patches, commiting more than one snapshot using the Delete all option from the Snapshot manager would require additional space to perform the operation. The amount of extra space required was directly related with the amount of snapshots and their size.

The patches modify the Delete all snapshots operation to commit every snapshot of the chain directly to the Base Disk(s) of the virtual machine. With this new algorithm:

  • If the Base Disk is preallocated (thick provision), no extra space will be required for the Delete all operation. The Base Disk will not grow as it is preallocated or thick.
  • If the Base Disk is non-preallocated (thin provision), the base disk will grow only on committing information from the snapshots. Each thin provision disk may grow up to its maximum size as mentioned in the Provisioned Size option in the Virtual Machine settings for the disk.

A quick way to know how much the Virtual Machine disk utilization can grow (whether you are using thin or mixed disk types) is taking a look to the Summary tab of the Virtual Machine.
On the right panel you will see "Provisioned Storage" and "Used Storage". The difference is how much the Base Disks can grow.

However, if the VM has disks in different Datastores, this amount will be shared between them.




Can I change the type of the virtual Disks when it has snapshots?
You cannot change the type of a Virtual disk with snapshots directly. You can perform one of these oprerations to change the type of disks:

  • When you do Storage Vmotion (SVmotion) you can change the type of disks, but not having snapshots is a requirement even for SVmotion.
  • When you Clone the virtual machine you can change the type of disks on destination virtual machine, however it will be the same type for all the disks.
  • If you clone from Service Console, you can select any type of disk for the destination, however you can not clone from the Service Console if the virtual machine is running.



What happens if I click 'Delete all' while the virtual machine is running?
If the virtual machine is running when you click Delete all from the interface or run vmware-cmd removesnapshots an additional snapshot is created to accommodate the incoming I/O while all the other snapshots get committed to the Base Disk. The size of the snapshot can grow depending on the I/O activity. It is ideal to reduce the I/O activity to facilitate the process.

That last snapshot will commit to the Base Disk at the very end of the process, after this all the snapshot files will be deleted.



Will each snapshot file get deleted as soon as it gets committed to the base disk releasing space on the Datastore?

No, all files are deleted together after completing the process.



These patches will invalidate the space needed given by SnapVMX and will make the snapshots troubleshooting much easier (I'll update them when I have some time). They are the result of a personal effort that has finally reached the end. After creating the snapshots troubleshooting guide and script, I decided that was time to work on eliminating the problem rather than just resolving it.

Well, the patches are out there. My advice, install them. I have been waiting long time to be able to announce them.




[Update June/2011] These patches were the result of a personal battle to eliminate the problem from its root. Initially I created a troubleshooting procedure and a helper script (SnapVMX) to minimize the time fixing the consequences of the problem. After a while, I changed my mind and focused on eliminating the problem itself. With the time we have seen it was the right thing to do. The daily long calls about problems originated by snapshots are now history.

3 comments:

  1. Fantastic!
    I just came home from a vacation to find a customer had made an error with snapshots and filled up the VMFS. Committing it/cleaning up has taken 36 hours so far.

    If this solves everything forever...
    Jubilation :-)

    ReplyDelete
  2. how do I kill this removeallsnapshots task? It's killing my free space on my datastore. I don't care if the vm in question for removing all snapshot gets corrupted. I just don't want it to take down all my vms and host

    ReplyDelete
  3. Restarting the management service on the host will crash the "removesnapshots".

    http://kb.vmware.com/kb/1003490

    The VM should keep running, but corruption in the snapshots chain may occur (it's not common).

    ReplyDelete