Acropolis Dynamic Scheduler [ADS]–AOS 5.0

In this post, My focus is on features released as part of AOS 5.0 more specifically on AHV. It must be remembered that AHV had released two years ago since then it is quickly gaining the feature parity with other hypervisors.  Although AHV are on KVM yet it stands apart from other KVM version when it comes ease, support, Reliability, and performance. The Focus of AHV it to make it real for Enterprise cloud. I believe Nutanix is changing the game by completely rephrasing private cloud term into Enterprise cloud by providing AWS kind of ease, flexibility, speed and performance inside your Datacenter. In the light of upcoming AOS 5.0 release, AHV has been supplemented with various features, Acropolis Dynamic Scheduler (ADS) is one among them. It can be argued ADS feature is similar to vSphere DRS, my stand on it is little different. Regardless of whether they are same feature or function Nutanix focus is on solving contention issue instead of load balancing. The vision for ADS is simple “Resources are fully consumed without compromising end user performance.” Being a Pioneer in HCI, Nutanix is at the tremendous advantage in providing QoS service to VMs as they can measure contention at Compute and Storage level without depending on 3rd party vendor/tools/injecting drivers.

Highlights of Acropolis Dynamic Scheduler

  1. ADS enabled by default. It makes total sense to me as Nutanix cluster need minimum three nodes to function. I cannot image a use case where you might have to disable ADC permanently.
  2. Initial placement i.e. VM is powered ON on the host which has less CPU and Storage Hotspots. It was there since early days.
  3. ADS keeps checking for hotspots every 15 minutes.
  4. The following Data is collected every 10 minutes for Historical utilization which forms the basis to make an intelligent placement decision. This data is referred as RunTime Metrics
    1. CPU Utilization of Host,
    2. CPU Utilization of VM
    3. CPU Utilization of Stargate
    4. CPU Utilization of vDisk threads

The data collected every 10 minutes (stats) are maintained by Arithmos in Prism and stored in NoSQL Cassandra.

Threshold

85% CPU Utilization threshold is configured per CVM (Stargate) and Node level. If this limit is exceeded on either of the fronts, then VMs on that particular node will be live migrated to other hosts. And in scenarios where Storage bottleneck is observed, Acropolis Block Services (ABS) will be migrated to different hosts. I understand this is a very simple explanation.

Affinity Rules

By and Large Affinity rules are required If you would like to separate VMs on the different host or colocate the VMs on the same host. These separation and co-location are needed to meet the licensing requirement or to make sure High availability is maintained at application layer even if the host goes down. On the one hand, there are some applications which perform at highest efficiency if they are on the same host (Affinity Rule), on the contrary, there are some applications which are deployed in redundant form to protect against the Virtualization host failure (Anti-Affinity Rule). Two types of Affinity rules are available.

  1. VM-Host Affinity Rule (Must Rule)
  2. VM-VM Anti-Affinity Rule (Should Rule)

VM-Host Affinity Rule is deployed to contain VM to specific host or group of hosts. VM-Host restriction is achieved using Must rule. Must rule is not violated under any circumstances or put it another way, “Must Rule” will always be respected. VM-VM Anti-Affinity Rule is different from VM-Host as it is just taken as only should rule. The VM-VM rule is deployed to restrict VMs on the different group of hosts. The should rule therefore is only “Best effort” rule. If should rule cannot be maintained Alert will be generated. Alerts are stored in Alert DB

Below is the high-level overview of how Acropolis Dynamic scheduler work. The figure below depicts the Anomalies which are scanned every 15 minutes, in the event any of anomalies are detected, the scheduler gets into action with appropriate remediation path shown below.

Acropolis Dynamic Scheduler High Level Overview
Acropolis Dynamic Scheduler High-Level Overview

In summary, ADS is the best value add to AHV which is aimed at fully utilizing resource without compromising end user performance. ADS is enabled by default and there no configuration parameters required to tune the ADS. In my opinion, ADS is deployment is transparent (invisible) At the same, there is more extensive and detailed blog posts are published by Andre Leibovici at http://myvirtualcloud.net/  on AOS 5.0.


One more thing, you don’t need worry about Enhanced vMotion Compatibility (EVC), it is just taken care.