Category Archives: Cluster

Part:08 Zerto Checkpoints–Automatic and Manual

Checkpoints are very important part of Zerto’s technical innovation. It is strong selling point of the product and strong uses cases for Business critical application. In fact VMware vSphere replication quite recently added similar feature. These checkpoint provide point in time (PIT) copies of VMs. Point in time(PIT) allows VMs to be rolled back to any point of your request. PIT term is widely used in High-end Enterprise scale storage arrays

Automatic

Checkpoints are automatically taken every seconds or as early as possible. These checkpoints are crash consistent and written to the journals by ZVM. During recovery you pickup crash consistent checkpoints in the journal and recover to this point

Manual

You can manually create checkpoints. Manually allows you to control time and date. You can think of it as a snapshot which you take it manually for particular reason.

So it gives you an option to recovery VMs to either manually or automatically created checkpoints allowing to do PIT recovery

Let’s see how to add manual checkpoint. Go to any tab of GUI you will see Checkpoint button there.

image

Click it, give name for the checkpoint. Select VPG for which you wish to take checkpoint and press Save.

Below is the event logged “insert tagged CP” in GUI and you should also see similar event in vCenter tasks

SNAGHTML12015d79

To recovery application using VSS agent

Zerto provides VSS agent. We discussed in this post here how to install VSS agent. VSS agent helps to backup application data in a consistent state.  Application data consistent is of utmost importance when you want to protect and recover application data. All good backup product includes VSS integration. Though VSS only works with Microsoft products.

After you successfully install VSS agent shortcut by name “Add VSS Checkpoint” is created on the desktop. You can take checkpoint using this shortcut, command line or schedule task. This checkpoint is sent to ZVM. ZVM then add this to Journal. VSS ensures it is in application consistent state.

NB: Such checkpoints are initiated within VM but are considered for entire entire VPG. Other VMs in that VPG will have crash consistent checkpoint. It is design consideration of VPG as to how many application need a protection and if they have to be part of same VPG group as VSS can take application consistent backup of only one VM

image

Figure: 01 when you double click the shortcut, give name to checkpoint

image

Figure: 02 confirms checkpoint is taken successfully

image

Figure: 03 Checkpoint event is successfully logged in vSphere client

image

Figure: 04 Checkpoint event is successfully logged in GUI

In below figure you see the checkpoints which are available for you when initiated failover (Test/Actual). I have marked VSS endpoint as yellow. Both checkpoint has name on it. It helps you identify the reason for checkpoint

image

Finally few things to keep in mind regarding checkpoints

  1. Checkpoints will have a performance impact. This impact is ruled by how much of the data is in memory of the VM. So be wary about this impact especially during production hours.
  2. VSS Agent can take application consistent backup of single VM only, Even if VSS agent is installed on all VMs of a particular VPG group. This is due write order fidelity implemented by Zerto
  3. Any changes in VPG that leads to re-synchronization of VPG will remove all checkpoints and synchronization starts from zero again. If there are no checkpoints there is no other way to get PIT (point in time).

Let discuss few things on checkpoints and Journal history. If you look at the slider here. This slide start position below is at 10:15:03 PM

image

And slider’s stop end position is at 10:56:28

image

It suggest you have checkpoints for last  41 minutes, 25 Seconds. But by default journal history is set for 4 hours, Ideally you should see difference between beginning and end position up to 4 hours.If it goes less than it e.g. if you have history for last 3 hours it starts sending warningFreezing and if it has history of just last 1 hour, it will start sending alerts Angry smile

The slider shows a maximum of 180 checkpoints spread over the most recent 24 hours stored in the journal. The older the checkpoints over this period the fewer checkpoints are shown, with at least two shown per hour. The majority of the checkpoints cover the most recent hour in the journal. To be even more specific use the Manual Select option.

image

In Manual Select option you get more granular options to choose. It is checkpoint taken every 5 second if you observe carefully. It also suggest you have the option of rolling back to minimum 5 seconds. In above figure you can also see the checkpoint manually taken from the VM. This explains the value of this product. Such granular protection few years ago was possible using only costly enterprise grade storage arrays.

If you wish to follow entire series of Zerto go to the Landing Page

Migrate an Existing Virtual Adapter to a vSphere Distributed Switch & vice versa

Migrate an Existing Virtual Adapter to a vSphere Distributed Switch

You can migrate an existing virtual adapter from a vSphere standard switch to a vSphere distributed switch.
Procedure
1 Log in to the vSphere Client and select the Hosts and Clusters inventory view.

2 Select the host in the inventory pane.
3 On the host Configuration tab, click Networking.
4 Select the vSphere Distributed Switch view.

image_thumb2
5 Click Manage Virtual Adapters.

image_thumb6
6 Click Add.

SNAGHTML602a69a

7 Select Migrate existing virtual network adapters and click Next.

SNAGHTML6037eb0

8 Select one or more virtual network adapters to migrate.

9 For each selected adapter, choose a port group from the Select a port group drop-down menu.

SNAGHTML607ef3d

10 Click Next.

SNAGHTML60842aa

11 Click Finish.


Migrate an existing virtual adapter from a vSphere distributed switch to a vSphere standard switch.

Procedure
1 Log in to the vSphere Client and select the Hosts and Clusters inventory view.
2 Select the host in the inventory pane.
3 On the host Configuration tab, click Networking.
4 Select the vSphere Distributed Switch view.

image_thumb2
5 Click Manage Virtual Adapters.

image_thumb6

6 Select the virtual adapter to migrate, and click Migrate.

SNAGHTML5f2f91c_thumb3

7 Select the standard switch to migrate the adapter to and click Next.

SNAGHTML5fced2b_thumb3
8 Enter a Network Label and optionally a VLAN ID for the virtual adapter, and click Next.

SNAGHTML5fc7972_thumb2
9 Click Finish to migrate the virtual adapter and complete the wizard.

Notes for FT Part-02

v:* {behavior:url(#default#VML);}
o:* {behavior:url(#default#VML);}
w:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}

For Notes for FT Part-01 click here

What happens when primary VM is powered ON?

·         The entire state of the Primary VM is copied and the Secondary VM is created, placed on a separate compatible host, and powered on if it passes admission control.

·         The Fault Tolerance Status displayed on the virtual machine’s Summary tab in the vSphere Client is Protected.

 

What happens when primary VM is powered OFF?

The Secondary VM is immediately created and registered to a host in the cluster (it might be re-registered to a more appropriate host when it is powered on.)

 

·         The Secondary VM is not powered on until after the Primary VM is powered on.

·         The Fault Tolerance Status displayed on the virtual machine’s Summary tab in the vSphere Client is Not Protected, VM not Running.

 

What happens to memory size when FT is enabled on VM?

When Fault Tolerance is turned on, vCenter Server removes the virtual machine’s memory limit and sets the memory reservation to the memory size of the virtual machine. While Fault Tolerance remains turned on, you cannot change the memory reservation, size, limit, or shares. When Fault Tolerance is turned off, any parameters that were changed are not reverted to their original values.

 

Is it possible to turn on FT on multiple VM in a single click?

No. If you select more than one virtual machine, the Fault Tolerance menu is disabled. You must turn Fault Tolerance on for one virtual machine at a time.

 

Is it possible to disable FT from secondary VM?

You cannot disable Fault Tolerance from the Secondary VM.

 

When does vCenter disables FT?

vCenter Server disables Fault Tolerance after being unable to power on the Secondary VM

 

 

Where to find how many FT enabled primary and secondary VM’s are present on the host?

You can view this information by accessing the host’s Summary tab in the vSphere Client. The Fault Tolerance section of this screen displays the total number of Primary and Secondary VMs residing on the host and the number of those virtual machines that are powered on. If the host is ESX/ESXi 4.1 or greater, this section also displays the Fault Tolerance version the host is running. Otherwise, it lists the host build number.

 

 

Note For two hosts to be compatible they must have matching FT version numbers or matching host build numbers.

 

What is vLockstep interval?

The time interval (displayed in seconds) needed for the Secondary VM to match the current execution state of the Primary VM. Typically, this interval is less than one-half of one second. No state is lost during a failover, regardless of the vLockstep Interval value.

 

What is the impact of hardware power management feature on FT enabled VMs?

Platform power management features that do not adjust based on workload (for example, power capping and enforced low frequency modes to save power) can cause processor frequencies to vary greatly. If Primary and Secondary VMs operate at different processor frequencies, the Secondary VM might be restarted more frequently. If Secondary VMs are being restarted on a regular basis, disable all power management modes on the hosts running fault tolerant virtual machines or ensure that all hosts are running in the same power management modes.

 

What is impact of network partitioned HA Cluster on FT VMs?

In a partitioned vSphere HA cluster, the Primary VM (or its Secondary VM) could end up in a partition managed by a master host that is not responsible for the virtual machine. When a failover is needed, a Secondary VM is restarted only if the Primary VM was in a partition managed by the master host responsible for it.

 

Recommendations

What is the maximum number of FT enabled VM’s recommended on ESXi host?

You should have no more than four fault tolerant virtual machines (primaries or secondaries) on any single host. The number of fault tolerant virtual machines that you can safely run on each host is based on the sizes and workloads of the ESXi host and virtual machines, all of which can vary.

 

What is the maximum number of virtual disks recommended on FT enabled VMs?

It is the maximum of 16 virtual disks per fault tolerant virtual machine

 

How to ensure redundancy and maximum fault tolerance for FT enabled VMs?

To ensure redundancy and maximum Fault Tolerance protection, you should have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created.

 

Recommendations for FT enabled VMs for placing them in a resource pools?

Ensure that a resource pool containing fault tolerant virtual machines has excess memory above the memory size of the virtual machines. The memory reservation of a fault tolerant virtual machine is set to the virtual machine’s memory size when Fault Tolerance is turned on. Without this excess in the resource pool, there might not be any memory available to use as overhead memory

 

 

vSphere HA Security

On what port does HA agents communicates with each other?

vSphere HA uses TCP and UDP port 8182 for agent-to-agent communication. The firewall ports open and close automatically to ensure they are open only when needed

 

Where vSphere HA places log files?

vSphere HA writes to syslog only by default, so logs are placed where syslog is configured to put them. The log file names for vSphere HA are prepended with fdm, fault domain manager, which is a service of vSphere HA.

 

Where vSphere HA stores the configuration files?

vSphere HA stores configuration information on the local storage or on ramdisk if there is no local datastore. These files are protected using file system permissions and they are accessible only to the root user.

 

 

Explain communication between vSphere Agent and vCenter?

All communication between vCenter Server and the vSphere HA agent is done over SSL. Agent-to-agent communication also uses SSL except for election messages, which occur over UDP. Election messages are verified over SSL so that a rogue agent can prevent only the host on which the agent is running from being elected as a master host. In this case, a configuration issue for the cluster is issued so the user is aware of the problem.

 

Which account is used by vSphere HA?

vSphere HA logs onto the vSphere HA agents using a user account, vpxuser, created by vCenter Server. This account is the same account used by vCenter Server to manage the host. vCenter Server creates a random password for this account and changes the password periodically. The time period is set by the

vCenter Server VirtualCenter.VimPasswordExpirationInDays setting. You can change the setting using the Advanced Settings control in the vSphere Client.

 


For Notes for FT Part-01 click here

Notes for FT

In FT when both the VMs can be on same host?

Both primary and secondary VMs must be on separate host to avoid any single point of failure. But these VMs can come on same host when they are both in a power off state

What is primary requirement for FT enabled VM to work on DRS enabled cluster

EVC mode must be enabled for FT enabled VM’s to take advantage of DRS’s initial placement and migration recommendation

If EVC is not enabled how does it affects/limits FT enabled VM with respect to DRS

When vSphere Fault Tolerance is used for virtual machines in a cluster that has EVC disabled, the fault tolerant virtual machines are given DRS automation levels of “disabled”. In such a cluster, each Primary VM is powered on only on its registered host (i.e. where it was originally created), its Secondary VM is automatically placed (i.e. initial placement happens every time), but neither fault tolerant virtual machine is moved for load balancing purposes.

What is the maximum number of host DRS places on any single host?

DRS do not place more than a fixed number of Primary or Secondary VMs on a host during initial placement or load balancing. This limit is controlled by the advanced option das.maxftvmsperhost. The default value for this option is 4. However if you set this option to 0, DRS ignores this restriction.

How does VM-VM affinity and VM-Host affinity rules impact FT enabled VMs?

VM-VM affinity rule applies only to the primary VM. If VM-VM affinity rule is set on primary VM, DRS attempt to correct any violations that occur after a failover. While VM-Host affinity rule applies to both Primary and secondary VM.

Is vCenter required for FT to work?

No. FT enable machine will failover when the host fails. The failover of fault tolerant virtual machines is independent of vCenter Server, but you must use vCenter Server to set up your Fault Tolerance clusters.

List down the tasks which must be completed before enabling FT?

· Enable host certificate checking. This is by default enabled. Check this only if you’ve upgraded from vsphere4.0 or below.

· Create VMKernel port for FT

· Create vSphere HA cluster, add hosts and check compliance

What happens when you disable FT logging port when both primary and secondary VMs are running?

If you configure networking to support FT but subsequently disable the Fault Tolerance logging port, pairs of fault tolerant virtual machines that are already powered on remain powered on. However, if a failover situation occurs, when the Primary VM is replaced by its Secondary VM a new Secondary VM is not started, causing the new Primary VM to run in a Not Protected state.

What all checks are done before FT is enabled on a VM?

  • vSphere HA is enabled
  • SSL Certificate checking is enabled
  • Version of ESXi host is better or equal to ESXi4.x
  • VM has only one vCPU
  • VM has no snapshots
  • VM is not a template
  • vSphere HA is not disabled
  • VM do not have a device which is 3-D enabled

What all checks are done when FT enabled VM is powered ON

  • Hardware virtualization is enabled at the BIOS of ESXi host
  • Host’s processor must support FT
  • Host on which Secondary VM is to be created is FT compatible host and is of same processor family as host on which primary is being created.
  • Guest OS compatibility with FT
  • Hardware compatibility with FT
  • And checks for any unsupported device

Checking the Operational Status of the Cluster

Configuration issues and other errors can occur for your cluster or its hosts that adversely affect the proper operation of vSphere HA. You can monitor these errors by looking at the Cluster Operational Status screen, which is accessible in the vSphere Client from the vSphere HA section of the cluster’s Summary tab. Address issues listed here.

clip_image001

Most configuration issues have a matching event that is logged. All vSphere HA events include “vSphere HA” in the description. You can search for this term to find the corresponding events.

image

Notes from High Availability Guide

What happens when host is joined to HA cluster?

Agent is uploaded to the host and configured to communicate with other agents in the cluster. Each host in the cluster functions as master host or slave host.

When does election/re-election of master hosts takes place?

When HA agents are enabled on all hosts, all active hosts (those not in standby, maintenance mode or disconnected state) participate in electing master host.

While re-election happens when master host fails, is shutdown or is removed from the cluster.


Note:

There is only one master host in a cluster and all other hosts are slave. The host which has maximum number of datastores mounted has an advantage in the election.


list down Checklists for HA?

ü All hosts must be licensed for HA. (All editions Standard, Enterprise and Enterprise plus supports HA)

ü Minimum two hosts are needed

ü All hosts need to be configured with static IP address or ensure DHCP is able to provide same address every time.

ü All hosts must have same virtual machine network and datastores

ü All Virtual machines must be located on shared datastores otherwise they cannot be failed over in the case of a host failover

ü For VM monitoring to work, VMware tools must be installed.

ü Host certificate check must be enabled

ü For datastore heartbeat minimum two datastores are needed

clip_image001

ü vSphere HA supports both IPV4 and IPV6. A cluster that mixes the use of both of these protocol versions, however is likely to face a network partition

Which setting is not available when you configure cluster without hosts?

The Specify a Failover Host admission control policy is unavailable until there is a host that can be designated as the failover host

Which settings are disabled when you move the host to HA cluster?

The Virtual Machine Startup and Shutdown (automatic startup) feature is disabled for all virtual machines residing on hosts that are in (or moved into) a vSphere HA cluster. Automatic startup is not supported when used with vSphere HA.

What is the importance of Host monitoring status?

Host monitoring status enables HA to monitor HA agent heartbeats. Heartbeats are sent by the vSphere agent on each host in the cluster

How to suspend HA?

You can suspend HA by de-selecting Host monitoring

Which setting is not available when you configure cluster without.

The Specify a Failover Host admission control policy is unavailable until there is a host that can be designated as the failover host

Which settings are disabled when you move the host to HA cluster?

The Virtual Machine Startup and Shutdown (automatic startup) feature is disabled for all virtual machines residing on hosts that are in (or moved into) a vSphere HA cluster.

Automatic startup is not supported when used with vSphere HA.

What is the importance of Host monitoring status?

Host monitoring status enables HA to monitor HA agent heatbeats.Heartbeats are sent by the HA agent on each host in the cluster.

If host isolation response is not working what could be wrong?

Check if Host monitoring status is enabled. Host isolation responses require that Host Monitoring Status is enabled. If Host Monitoring Status is disabled, host isolation responses are also suspended. A host determines that it is isolated when it is unable to communicate with the agents running on the other hosts and it is unable to ping its isolation addresses. When this occurs, the host executes its isolation response.

The responses are:

Leave powered on (the default), Power off, and Shut down. You can customize this property for individual virtual machines.

What is dependency of isolation response with vmware tools?

This dependency is specifically on shutdown isolation response. To use the Shut down VM(graceful shutdown) , you must install VMware Tools in the guest operating system of the virtual machine. Virtual machines that are in the process of shutting down will take longer to fail over while the shutdown completes.

Virtual Machines that have not shut down in 300 seconds, or the time specified in the advanced attribute das.isolationshutdowntimeout seconds, are powered off.

What is the drawbag of disabled isolation response?

Disabled isolation response may lead to split brain situation when host loses both management and storage network connectivity. In this case, the isolated host loses the disk locks and the virtual machines are failed over to another host even though the original instances of the virtual machines remain running on the isolated host. When the host comes out of isolation, there will be two copies of the virtual machines, although the copy on the originally isolated host does not have access to the vmdk files and data corruption is prevented. In the vSphere Client, the virtual machines appear to be flipping back and forth between the two hosts. To recover from this situation, ESXi generates a question on the virtual machine that has lost the disk locks for when the host comes out of isolation and realizes that it cannot reacquire the disk locks. vSphere HA automatically answers this question and this allows the virtual machine instance that has lost the disk locks to power off.

How does VM restart priority works?

 

VM restart priority determines the relative order in which virtual machines are restarted after a host failure. Such virtual machines with the highest priority are restarted first and continuing to those with lower priority until all virtual machines are restarted or no more cluster resources are available. If the number of hosts failures exceeds what admission control permits, the virtual machines with lower priority might not be restarted until more resources become available. Virtual machines are restarted on the failover host, if one is specified.

 

SNAGHTMLaa0c23

 

Recommended restart priority

image

Cluster states of Failover Hosts Admission Control Policy

 

The Current Failover Hosts appear in the vSphere HA section of the cluster’s Summary tab in the vSphere Client. The status icon next to each host can be green, yellow, or red.

Green: The host is connected, not in maintenance mode, and has no vSphere HA errors. No powered-on virtual machines reside on the host.

image

Yellow: The host is connected, not in maintenance mode, and has no vSphere HA errors. However, powered-on virtual machines reside on the host.

image

Red: The host is disconnected, in maintenance mode, or has vSphere HA errors.

image