Thin or thick provisioning?

This is good explanation why not to use thin provisioning especially in small business where we cannot afford any vm redundancy.

expected survival rate after one year:
Windows 2008 system installed on certified hardware: 98%
same system on thick provisioned eagerzeroed vmdk running on ESXi: 97%
same system on thin provisioned vmdk running on ESXi: 60%
same as before plus automatic backup by Veeam or similar: 50 %

I dont think those numbers are completely off but I dont have statistics to backup such a claim.

So I rather ask the questions:
– what happens when problems occur ?
– how well are powerfailure or similar problems handled ?
– how well does the system check for errors and how well can it repair themselves ?
– which problems can a user handle himself ?
– which problems can be solved by VMware support ?
– is there any documentation for troubleshooting ?
– are there 3rd party tools that can be used if a problem occurs ?
– how severe do the problems have to be to result in a complete loss ?
– does the filesystem itself offer any repair or selfhealing features ?

I do remote support for this problems since about 2007 – the last 4 years as a consultant for a VMware partner.
The experience I gathered in that time can be summarized like this:

– smallest errors in a thin-vmdk mapping table render the vmdk as unreadable
– smallest errors in a snapshot graintable render the snapshot as unreadable
– loss of the partitiontable for the VMFS-volume has to be expected when the system has a powerfailure
– for small and medium VMware customers calling VMware support for help with damaged thin vmdks, snapshots or VMFS-volumes usually is not worth the effort
– VMFS seems to have no redundant functions to fix small problems after a reboot
– the heartbeat functions that enable cluster access can not be resetted by the user – that means that ESXi often denies to use/read a volume even if the reason to do so no longer exists
– 99 % of the vSphere admins I talk to in my job do not have the skills required to fix even smallest problems with thin vmdks or snapshots
– most of the admins I talk to somehow compare VMFS with the behaviour of NTFS – most are of them are shocked when I tell them that there is no equivalent for chkdsk

So IMHO this aspects all sum up to:
– thin provisioned vmdks die without early warning
– the chance that a user can fix an error himself are almost non-existant
– trying repairs is a waste of time in most cases
– it has to be expected that in case mission critical data has to be recovered inside a predictable time frame results in an Invoice from Kroll Ontrack starting with 5000$ or more.

If my customers ask for recommendations for Thin/thick provisioning I think there is only one safe answer:

To be on the safe side thin provisioning should only be used when either:
– the VM is disposable – like a View-VM
– a solid and tested backup / or replacement policy is active so that the loss of a thin VM just becomes a calculated loss of a few hours worth of data

For thick provisioned VMs the story is very different.
A skilled admin can aquire the skills required to fix all problems that are caused by the vmdk-layer and the VMFS filesystem.
So for a skilled admin thick VMs have almost the same behaviour as a Windows-system running on physical hardware.

Author: continuum