Category Archives: HA

Notes for FT Part-02

v:* {behavior:url(#default#VML);}
o:* {behavior:url(#default#VML);}
w:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}

For Notes for FT Part-01 click here

What happens when primary VM is powered ON?

·         The entire state of the Primary VM is copied and the Secondary VM is created, placed on a separate compatible host, and powered on if it passes admission control.

·         The Fault Tolerance Status displayed on the virtual machine’s Summary tab in the vSphere Client is Protected.


What happens when primary VM is powered OFF?

The Secondary VM is immediately created and registered to a host in the cluster (it might be re-registered to a more appropriate host when it is powered on.)


·         The Secondary VM is not powered on until after the Primary VM is powered on.

·         The Fault Tolerance Status displayed on the virtual machine’s Summary tab in the vSphere Client is Not Protected, VM not Running.


What happens to memory size when FT is enabled on VM?

When Fault Tolerance is turned on, vCenter Server removes the virtual machine’s memory limit and sets the memory reservation to the memory size of the virtual machine. While Fault Tolerance remains turned on, you cannot change the memory reservation, size, limit, or shares. When Fault Tolerance is turned off, any parameters that were changed are not reverted to their original values.


Is it possible to turn on FT on multiple VM in a single click?

No. If you select more than one virtual machine, the Fault Tolerance menu is disabled. You must turn Fault Tolerance on for one virtual machine at a time.


Is it possible to disable FT from secondary VM?

You cannot disable Fault Tolerance from the Secondary VM.


When does vCenter disables FT?

vCenter Server disables Fault Tolerance after being unable to power on the Secondary VM



Where to find how many FT enabled primary and secondary VM’s are present on the host?

You can view this information by accessing the host’s Summary tab in the vSphere Client. The Fault Tolerance section of this screen displays the total number of Primary and Secondary VMs residing on the host and the number of those virtual machines that are powered on. If the host is ESX/ESXi 4.1 or greater, this section also displays the Fault Tolerance version the host is running. Otherwise, it lists the host build number.



Note For two hosts to be compatible they must have matching FT version numbers or matching host build numbers.


What is vLockstep interval?

The time interval (displayed in seconds) needed for the Secondary VM to match the current execution state of the Primary VM. Typically, this interval is less than one-half of one second. No state is lost during a failover, regardless of the vLockstep Interval value.


What is the impact of hardware power management feature on FT enabled VMs?

Platform power management features that do not adjust based on workload (for example, power capping and enforced low frequency modes to save power) can cause processor frequencies to vary greatly. If Primary and Secondary VMs operate at different processor frequencies, the Secondary VM might be restarted more frequently. If Secondary VMs are being restarted on a regular basis, disable all power management modes on the hosts running fault tolerant virtual machines or ensure that all hosts are running in the same power management modes.


What is impact of network partitioned HA Cluster on FT VMs?

In a partitioned vSphere HA cluster, the Primary VM (or its Secondary VM) could end up in a partition managed by a master host that is not responsible for the virtual machine. When a failover is needed, a Secondary VM is restarted only if the Primary VM was in a partition managed by the master host responsible for it.



What is the maximum number of FT enabled VM’s recommended on ESXi host?

You should have no more than four fault tolerant virtual machines (primaries or secondaries) on any single host. The number of fault tolerant virtual machines that you can safely run on each host is based on the sizes and workloads of the ESXi host and virtual machines, all of which can vary.


What is the maximum number of virtual disks recommended on FT enabled VMs?

It is the maximum of 16 virtual disks per fault tolerant virtual machine


How to ensure redundancy and maximum fault tolerance for FT enabled VMs?

To ensure redundancy and maximum Fault Tolerance protection, you should have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created.


Recommendations for FT enabled VMs for placing them in a resource pools?

Ensure that a resource pool containing fault tolerant virtual machines has excess memory above the memory size of the virtual machines. The memory reservation of a fault tolerant virtual machine is set to the virtual machine’s memory size when Fault Tolerance is turned on. Without this excess in the resource pool, there might not be any memory available to use as overhead memory



vSphere HA Security

On what port does HA agents communicates with each other?

vSphere HA uses TCP and UDP port 8182 for agent-to-agent communication. The firewall ports open and close automatically to ensure they are open only when needed


Where vSphere HA places log files?

vSphere HA writes to syslog only by default, so logs are placed where syslog is configured to put them. The log file names for vSphere HA are prepended with fdm, fault domain manager, which is a service of vSphere HA.


Where vSphere HA stores the configuration files?

vSphere HA stores configuration information on the local storage or on ramdisk if there is no local datastore. These files are protected using file system permissions and they are accessible only to the root user.



Explain communication between vSphere Agent and vCenter?

All communication between vCenter Server and the vSphere HA agent is done over SSL. Agent-to-agent communication also uses SSL except for election messages, which occur over UDP. Election messages are verified over SSL so that a rogue agent can prevent only the host on which the agent is running from being elected as a master host. In this case, a configuration issue for the cluster is issued so the user is aware of the problem.


Which account is used by vSphere HA?

vSphere HA logs onto the vSphere HA agents using a user account, vpxuser, created by vCenter Server. This account is the same account used by vCenter Server to manage the host. vCenter Server creates a random password for this account and changes the password periodically. The time period is set by the

vCenter Server VirtualCenter.VimPasswordExpirationInDays setting. You can change the setting using the Advanced Settings control in the vSphere Client.


For Notes for FT Part-01 click here

Notes for FT

In FT when both the VMs can be on same host?

Both primary and secondary VMs must be on separate host to avoid any single point of failure. But these VMs can come on same host when they are both in a power off state

What is primary requirement for FT enabled VM to work on DRS enabled cluster

EVC mode must be enabled for FT enabled VM’s to take advantage of DRS’s initial placement and migration recommendation

If EVC is not enabled how does it affects/limits FT enabled VM with respect to DRS

When vSphere Fault Tolerance is used for virtual machines in a cluster that has EVC disabled, the fault tolerant virtual machines are given DRS automation levels of “disabled”. In such a cluster, each Primary VM is powered on only on its registered host (i.e. where it was originally created), its Secondary VM is automatically placed (i.e. initial placement happens every time), but neither fault tolerant virtual machine is moved for load balancing purposes.

What is the maximum number of host DRS places on any single host?

DRS do not place more than a fixed number of Primary or Secondary VMs on a host during initial placement or load balancing. This limit is controlled by the advanced option das.maxftvmsperhost. The default value for this option is 4. However if you set this option to 0, DRS ignores this restriction.

How does VM-VM affinity and VM-Host affinity rules impact FT enabled VMs?

VM-VM affinity rule applies only to the primary VM. If VM-VM affinity rule is set on primary VM, DRS attempt to correct any violations that occur after a failover. While VM-Host affinity rule applies to both Primary and secondary VM.

Is vCenter required for FT to work?

No. FT enable machine will failover when the host fails. The failover of fault tolerant virtual machines is independent of vCenter Server, but you must use vCenter Server to set up your Fault Tolerance clusters.

List down the tasks which must be completed before enabling FT?

· Enable host certificate checking. This is by default enabled. Check this only if you’ve upgraded from vsphere4.0 or below.

· Create VMKernel port for FT

· Create vSphere HA cluster, add hosts and check compliance

What happens when you disable FT logging port when both primary and secondary VMs are running?

If you configure networking to support FT but subsequently disable the Fault Tolerance logging port, pairs of fault tolerant virtual machines that are already powered on remain powered on. However, if a failover situation occurs, when the Primary VM is replaced by its Secondary VM a new Secondary VM is not started, causing the new Primary VM to run in a Not Protected state.

What all checks are done before FT is enabled on a VM?

  • vSphere HA is enabled
  • SSL Certificate checking is enabled
  • Version of ESXi host is better or equal to ESXi4.x
  • VM has only one vCPU
  • VM has no snapshots
  • VM is not a template
  • vSphere HA is not disabled
  • VM do not have a device which is 3-D enabled

What all checks are done when FT enabled VM is powered ON

  • Hardware virtualization is enabled at the BIOS of ESXi host
  • Host’s processor must support FT
  • Host on which Secondary VM is to be created is FT compatible host and is of same processor family as host on which primary is being created.
  • Guest OS compatibility with FT
  • Hardware compatibility with FT
  • And checks for any unsupported device

Checking the Operational Status of the Cluster

Configuration issues and other errors can occur for your cluster or its hosts that adversely affect the proper operation of vSphere HA. You can monitor these errors by looking at the Cluster Operational Status screen, which is accessible in the vSphere Client from the vSphere HA section of the cluster’s Summary tab. Address issues listed here.


Most configuration issues have a matching event that is logged. All vSphere HA events include “vSphere HA” in the description. You can search for this term to find the corresponding events.


Notes from High Availability Guide

What happens when host is joined to HA cluster?

Agent is uploaded to the host and configured to communicate with other agents in the cluster. Each host in the cluster functions as master host or slave host.

When does election/re-election of master hosts takes place?

When HA agents are enabled on all hosts, all active hosts (those not in standby, maintenance mode or disconnected state) participate in electing master host.

While re-election happens when master host fails, is shutdown or is removed from the cluster.


There is only one master host in a cluster and all other hosts are slave. The host which has maximum number of datastores mounted has an advantage in the election.

list down Checklists for HA?

ü All hosts must be licensed for HA. (All editions Standard, Enterprise and Enterprise plus supports HA)

ü Minimum two hosts are needed

ü All hosts need to be configured with static IP address or ensure DHCP is able to provide same address every time.

ü All hosts must have same virtual machine network and datastores

ü All Virtual machines must be located on shared datastores otherwise they cannot be failed over in the case of a host failover

ü For VM monitoring to work, VMware tools must be installed.

ü Host certificate check must be enabled

ü For datastore heartbeat minimum two datastores are needed


ü vSphere HA supports both IPV4 and IPV6. A cluster that mixes the use of both of these protocol versions, however is likely to face a network partition

Which setting is not available when you configure cluster without hosts?

The Specify a Failover Host admission control policy is unavailable until there is a host that can be designated as the failover host

Which settings are disabled when you move the host to HA cluster?

The Virtual Machine Startup and Shutdown (automatic startup) feature is disabled for all virtual machines residing on hosts that are in (or moved into) a vSphere HA cluster. Automatic startup is not supported when used with vSphere HA.

What is the importance of Host monitoring status?

Host monitoring status enables HA to monitor HA agent heartbeats. Heartbeats are sent by the vSphere agent on each host in the cluster

How to suspend HA?

You can suspend HA by de-selecting Host monitoring

Which setting is not available when you configure cluster without.

The Specify a Failover Host admission control policy is unavailable until there is a host that can be designated as the failover host

Which settings are disabled when you move the host to HA cluster?

The Virtual Machine Startup and Shutdown (automatic startup) feature is disabled for all virtual machines residing on hosts that are in (or moved into) a vSphere HA cluster.

Automatic startup is not supported when used with vSphere HA.

What is the importance of Host monitoring status?

Host monitoring status enables HA to monitor HA agent heatbeats.Heartbeats are sent by the HA agent on each host in the cluster.

If host isolation response is not working what could be wrong?

Check if Host monitoring status is enabled. Host isolation responses require that Host Monitoring Status is enabled. If Host Monitoring Status is disabled, host isolation responses are also suspended. A host determines that it is isolated when it is unable to communicate with the agents running on the other hosts and it is unable to ping its isolation addresses. When this occurs, the host executes its isolation response.

The responses are:

Leave powered on (the default), Power off, and Shut down. You can customize this property for individual virtual machines.

What is dependency of isolation response with vmware tools?

This dependency is specifically on shutdown isolation response. To use the Shut down VM(graceful shutdown) , you must install VMware Tools in the guest operating system of the virtual machine. Virtual machines that are in the process of shutting down will take longer to fail over while the shutdown completes.

Virtual Machines that have not shut down in 300 seconds, or the time specified in the advanced attribute das.isolationshutdowntimeout seconds, are powered off.

What is the drawbag of disabled isolation response?

Disabled isolation response may lead to split brain situation when host loses both management and storage network connectivity. In this case, the isolated host loses the disk locks and the virtual machines are failed over to another host even though the original instances of the virtual machines remain running on the isolated host. When the host comes out of isolation, there will be two copies of the virtual machines, although the copy on the originally isolated host does not have access to the vmdk files and data corruption is prevented. In the vSphere Client, the virtual machines appear to be flipping back and forth between the two hosts. To recover from this situation, ESXi generates a question on the virtual machine that has lost the disk locks for when the host comes out of isolation and realizes that it cannot reacquire the disk locks. vSphere HA automatically answers this question and this allows the virtual machine instance that has lost the disk locks to power off.

How does VM restart priority works?


VM restart priority determines the relative order in which virtual machines are restarted after a host failure. Such virtual machines with the highest priority are restarted first and continuing to those with lower priority until all virtual machines are restarted or no more cluster resources are available. If the number of hosts failures exceeds what admission control permits, the virtual machines with lower priority might not be restarted until more resources become available. Virtual machines are restarted on the failover host, if one is specified.




Recommended restart priority


Cluster states of Failover Hosts Admission Control Policy


The Current Failover Hosts appear in the vSphere HA section of the cluster’s Summary tab in the vSphere Client. The status icon next to each host can be green, yellow, or red.

Green: The host is connected, not in maintenance mode, and has no vSphere HA errors. No powered-on virtual machines reside on the host.


Yellow: The host is connected, not in maintenance mode, and has no vSphere HA errors. However, powered-on virtual machines reside on the host.


Red: The host is disconnected, in maintenance mode, or has vSphere HA errors.


HA Failover and Admission controls


vSphere HA might not be able to fail over virtual machines because of resource constraints. This can occur for several reasons

  • HA admission control is disabled and Distributed Power Management (DPM) is enabled. This can result in DPM consolidating virtual machines onto fewer hosts and placing the empty hosts in standby mode leaving insufficient powered-on capacity to perform a failover.


  • VM-Host affinity (required) rules might limit the hosts on which certain virtual machines can be placed. The VM-Host affinity rules that are required, cannot be violated. vSphere HA does not perform a failover if doing so would violate such a rule.


  • There might be sufficient aggregate resources but these can be fragmented across multiple hosts so that they can not be used by virtual machines for failover.

Admission control

vCenter Server uses admission control to ensure that sufficient resources are available in a cluster to provide failover protection and to ensure that virtual machine resource reservations are respected. Three types of admission control are available.


Admission control imposes constraints on resource usage and any action that would violate these constraints is not permitted.

Examples of actions that could be disallowed include the following:

  • Powering on a virtual machine.
  • Migrating a virtual machine onto a host or into a cluster or resource pool.
  • Increasing the CPU or memory reservation of a virtual machine.

Only vSphere HA admission control can be disabled. If this admission control is disabled there is no guarantee on how many virtual machines can be restarted in event of failover.

When vSphere HA admission control is disabled, vSphere HA ensures that there are at least two powered-on hosts in the cluster even if DPM is enabled and can consolidate all virtual machines onto a single host. This is to ensure that failover is possible.



Datastore heartbeating




You can use the advanced attribute das.heartbeatdsperhost to change the number of heartbeat datastores selected by vCenter Server for each host. The default is two and the maximum valid value is five.

If you deploy a converged network environment, where storage and management network traffic travel over the same physical NICs, disable datastore heartbeating. It does not provide any benefit in this type of network.

vSphere HA creates a directory at the root of each datastore that is used for both datastore heartbeating and for persisting the set of protected virtual machines. The name of the directory is .vSphere-HA.


Do not delete or modify the files stored in this directory, because this can have an impact on operations. Because more than one cluster might use a datastore, subdirectories for this directory are created for each cluster. Root owns these directories and files and only root can read and write to them. With vmfs3, the maximum usage is approximately 2GB and the typical usage is approximately 3MB. With vmfs5 the maximum and typical usage is approximately 3MB.