This is good explanation why not to use thin provisioning especially in small business where we cannot afford any vm redundancy.
expected survival rate after one year:
Windows 2008 system installed on certified hardware: 98%
same system on thick provisioned eagerzeroed vmdk running on ESXi: 97%
same system on thin provisioned vmdk running on ESXi: 60%
same as before plus automatic backup by Veeam or similar: 50 %
I dont think those numbers are completely off but I dont have statistics to backup such a claim.
So I rather ask the questions:
– what happens when problems occur ?
– how well are powerfailure or similar problems handled ?
– how well does the system check for errors and how well can it repair themselves ?
– which problems can a user handle himself ?
– which problems can be solved by VMware support ?
– is there any documentation for troubleshooting ?
– are there 3rd party tools that can be used if a problem occurs ?
– how severe do the problems have to be to result in a complete loss ?
– does the filesystem itself offer any repair or selfhealing features ?
I do remote support for this problems since about 2007 – the last 4 years as a consultant for a VMware partner.
The experience I gathered in that time can be summarized like this:
– smallest errors in a thin-vmdk mapping table render the vmdk as unreadable
– smallest errors in a snapshot graintable render the snapshot as unreadable
– loss of the partitiontable for the VMFS-volume has to be expected when the system has a powerfailure
– for small and medium VMware customers calling VMware support for help with damaged thin vmdks, snapshots or VMFS-volumes usually is not worth the effort
– VMFS seems to have no redundant functions to fix small problems after a reboot
– the heartbeat functions that enable cluster access can not be resetted by the user – that means that ESXi often denies to use/read a volume even if the reason to do so no longer exists
– 99 % of the vSphere admins I talk to in my job do not have the skills required to fix even smallest problems with thin vmdks or snapshots
– most of the admins I talk to somehow compare VMFS with the behaviour of NTFS – most are of them are shocked when I tell them that there is no equivalent for chkdsk
So IMHO this aspects all sum up to:
– thin provisioned vmdks die without early warning
– the chance that a user can fix an error himself are almost non-existant
– trying repairs is a waste of time in most cases
– it has to be expected that in case mission critical data has to be recovered inside a predictable time frame results in an Invoice from Kroll Ontrack starting with 5000$ or more.
If my customers ask for recommendations for Thin/thick provisioning I think there is only one safe answer:
To be on the safe side thin provisioning should only be used when either:
– the VM is disposable – like a View-VM
– a solid and tested backup / or replacement policy is active so that the loss of a thin VM just becomes a calculated loss of a few hours worth of data
For thick provisioned VMs the story is very different.
A skilled admin can aquire the skills required to fix all problems that are caused by the vmdk-layer and the VMFS filesystem.
So for a skilled admin thick VMs have almost the same behaviour as a Windows-system running on physical hardware.