Category Archives: VMware Horizon

[VMware] VDI Requirement Gathering

Oh! it is been a month i haven’t written a single post. Ah! Blogging is my favorite activity. I love sharing my experience and learning. This the single most platform I can express my thoughts on technical front. I don’t know how many of you like  but I don’t see that as motivation factor. In all cases I would love to hear back from you. Recently I did a VDI requirement gathering workshop with a customer. Based on various design meeting I have come across questionnaire. I would like to share with you. You will need to basic understanding VDI especially technology you are supporting. First and foremost and most important why are you looking towards VDI. Don’t start with Why question. Rather I suggest you put across a question in way a your customer understands. It is worth noting first meeting will be with IT manager, CXO. They would understand if you ask them what is the primary objective in exploring VDI options.

What is the business goals/Drivers for VDI?

Security, Cost saving, desktop refresh. These are few of the options which can help you to drive the discussion. Without understanding each of the Business drivers your conversation will be more like Q&A. It should be discussion. If desktop refresh is one of the drivers, then immediate question would be to understand if existing desktops can be reused. Are the existing desktop end of life.  Since existing desktop will be used, it is very likely user might use both the desktop. It is opportunity to ask where users will be saving their data. It would also give you insight that you need some profile migration tool in place. Since we are here, whether users are using PST and if they are storing in some central location. Here is reference on this topic. This post also provide you likely solutions

What applications will be used via VDI Desktops and What is the nature of this application?

This is most important thing I learnt from Brian Suhr book. VDI is all about apps and not about desktops alone. How you present Applications (apps) to the end user. iPad, Tablets,Phone and Cars is of utmost important. Entire focus of your discussion should around these applications. Who is using these application and what they are doing with these applications. Are there is common set of applications used across your organization? Are there heavy graphics, High I/O (Autocad, Visual Studio), Memory Intensive CPU intensive (Graphics), Recording Audio application in used. Are these application business specific, can these application be down? This discussion will help you decide 1.) if you need a multiple desktop pools 2) Do you need any application virtualization feature. This could be easily guessed, more variation in applications portfolio, more will be inclination to separating application from desktop pool . Most frequently used application can be part of standard image or can be thin app’ed. This is very well explained in Brian’s book. In each case you need to the count of users who are using this application. e.g. If photoshop users are only 5 and they just use it for light graphics you probably don’t need grid cards. If these are heavy graphic users along with considerations of Grid cards, you are very likely to consider to Monitor size and resolution. You could see how one question leads to answer to another. Now that you understand the nature of application, most critical part is how license works. e.g. Office licenses need validation and it need license management server (KMS).

Are there any users who need to install applications on local desktops other than desktop admins?

Now this is one of the use case for persistent desktops. If there are developers in your organization who need pool of applications, they obviously need administrator access and much more. As could be easily guess, you must know how many developers/users with this requirement are needed. This will drive the DR strategy for persistent desktops. Along with, you need to know how critical is their nature of work. Here you can pause and ask how frequently application refresh occurs and how applications are refreshed. This is critical piece of information as these will impact application virtualization and it’s efforts need to update. e.g. If application-A is refreshed every month (yes there are applications which are refreshed every month), and if you are proposing application virtualization for these set of application, you need to consider how are you going to ensure these updates are integrated. This is on-going cost and may vary based on complexity of application. yes I’m reading your mind “App Volumes”. Yeah!, do you need to be architect to say/propose it. Think again!!!

Are users working in shift?

If yes, what is anticipated concurrent users. This will help you decide licenses for VDI and CALs. This will also help you decide % of users who need floating desktops. e.g. if there are 300 users working in a 3 shifts, i.e. 100 users per shift. You just need 100 Concurrent user licenses, you can provide 5% allowances and procure licenses. Floating desktops is must here. CALS refer to end user CALS for desktops or RDSH if you are offering RDSH based desktops. This could be also appropriate place to understand if Terminal services licenses are there with customer

What is anticipated total users (if they are not shift users)?

This will help you identify license requirement for AV, Software licenses for Office, Desktops and other product which do not based their licensing concurrent users. You could relate difference between 300 licenses or 100 AntiVirus license.

From where end users are going to access desktops/applications?

This help you understanding how access has to be granted to the end users e.g. WAN/LAN/Internet. If there are Multiple sites, what is the required bandwidth between these sites. How users are going to access from the remote site. (thin client/Desktop/Laptop). Internet: They could be mobile users, working from home or working from office. Number of users, number of applications they need to access will have direct impact on bandwidth and latency required

Do you need access to desktop from home?

Yes it is not application access but desktop access. If yes, there is whole lot of security considerations. You need view security server or identity access appliance. Identity access appliance would be suitable if there is sufficient VMware Infrastructure in DMZ. All users would need access from Home? Do users needs two factor authentication? if Yes, RSA token is license per user. Is access from Home critical? or it is access on best effort basis. It will drive your high availability design. Again you will need restrict VDI Desktops to specific VLAN

Is user using Lync/Audio/Video user ?

Lync will have direct impact on selection of thin client. It must support Lync plug-in. Zero client definitely do not support it. Factors like features, cost, Design and performance.

Do you need USB devices/Scanner/SmartCard Readers redirection?

This is often forgotten. User need USB devices for various reason. It must be able to accommodate this requirement. In hospitality industry things become more critical when they need to move between room attending patients. This requirement will have indirect impact on your selection of endpoint device.

List down the agent installed on the desktop

  • AV Agent
  • Backup Agent
  • SCCM/LANDesk Agent3

Do you still needs these agents in VDI Desktops? Backup Agent? definitely not. You no longer would be taking desktop backups. would you?

Following questions will help you build supporting infrastructure

  1. Do you have Certificate Authority? If no, you either have to recommend one to be prerequisite(read this post from Harsha) or assist them in building one
  2. Do you have Load balancer in your existing solution? If no, you can either procure on behalf of them or ask them to in pre-requisite list? If they need active active VDI solution, then Load balancer should be intelligent to divert traffic based on source IP
  3. Do you have SRM? If DR strategy is Active-Passive, then SRM will assist in DR failover to VDI components. Refer this white paper for further details
  4. Do you have terminal server licenses? If yes, you can explore the possibility of providing RDSH to the customer for select applications
  5. Where users are storing data? local desktop/laptop? then you must considered file server in your design for user data and probably for PST as well.
  6. Do you have DHCP server at  site? Is it redundant?
  7. Are there any non-corporate users accessing desktops? e.g. Vendor, contractors? How these users prohibited from accessing corp data?
  8. How are using connecting to the network? These will have direct impact on users endpoint.

This is just the tip of iceberg. If you follow this questionnaire, I’m sure you can built your own based on your experiences. Biggest advantages of this questionnaire is, it allows you to build requirement gathering document without much effort.


Availability & Recoverability Matrix of View Components

While working for one of the View Design Project, I have to do lots of reading to ensure Horizon view components are highly available. Eventually I felt it would be excellent idea to  create below matrix to find Single Point Of Failure (SPOF), and how to address the SPOF at each layer (Service, OS, Hypervisor, Storage). Below I have listed main components in View, what are potential failure points and ways they can be addressed.

So, We made a Design decision to protect Horizon View solution  using Windows Server Failover Cluster (WSFC)

In most of use cases, vSphere HA is sufficient to meet the availability of solution. With vSphere HA, If esxi Host reboots  Guest OS and its application is back in operation in less than 15 minutes. Please note standard is less than 5 min but remember application also need some time to start. I’m referring to generic application not an application which has multi-dependency. e.g. vCenter has down stream dependency on Database, DNS

Let’s first find out  what are all scenarios vCenter can fail. vCenter as VM can fail if ESXi fails, vCenter as Guest OS can fail, if OS gets corrupt/Virus attack, vCenter as service can fail, may be because database is unavailable and for other reasons cause could be many.

What are various protection mechanism for each type of failure?

  1. vCenter as VM & OS will be protected using vSphere HA, it will restart but then there is 10-15 mins delay before vCenter is back in operation.
  2. vCenter services can be protected using in-built watchdog and by protecting vCenter services using Microsoft failover cluster. In this case vCenter can return to operations in less than a min. But heart of vCenter i.e. it’s database still remain SPOF.

Accordingly each component is tabulated below


ESXi Failure

OS Failure

Application Failure

Impact on Availability

Special Configuration


Reboots vCenter Node

VM monitoring with VMware HA will attempt to restart OS. 

vCenter services are seamlessly failed over to second node.

near zero downtime

Minimal impact as services are offered by second node

2 node MS Failover cluster must be configured & Need to configure Anti-affinity rule

MS SQL Database

Reboots SQL Server Node

VM monitoring with VMware HA will attempt to restart OS

MS SQL services are seamlessly failed over to second node. Near Zero downtime

Minimal impact as services are offered by second node in MS Failover cluster

2 node MS Failover cluster must be configured

& Need to enable Anti-affinity rule

View Connection Server

Reboots View Connection Server

VM monitoring with VMware HA will attempt to restart OS

Load Balancer removes this node from membership so that traffic Re-directed to other nodes.Zero downtime

Zero impact as services are offered using other nodes in Load Balancer

Need to enable Anti-affinity rule

View Composer

Reboots View composer Server

VM monitoring with VMware HA will attempt to restart OS

No in-built protection. 5-10 minutes downtime.

VM is rebooted, Impact provisioning, Recompose operation during the outage window


Reboots File Server Node

VM monitoring with VMware HA will attempt to restart OS

File services are seamlessly failed over to second node.Near Zero downtime

Minimal impact as services are offered by second node in WSFC cluster

2 node MS Failover cluster must be configured & Need to enable Anti-affinity rule.

Design Justification

  1. Number of components (as per downstream figure below), are dependent on Database (MS SQL in our case), unavailability of database can potentially bring entire solution down
  2. Even if database can protected using vSphere HA,  database might take 15 minutes to be back in operational. Post that vCenter database might take another 15 minutes to come up, as vCenter service is dependent on vCenter database. Similarly view composer service will need 15 minutes to be back in operational. But vCenter and view composer service can come on line together if they are co-located on same server.
  3. Post that View connection server will be able to send commands to vCenter and view composer.
  4. Below explains the delay from the point database server rebooted till view connection server is ready to send any operation command to vCenter/View composer. If WSFC is not used, view solution might not be available for minimum 30 minutes

    Horizon View Availability Impact with no WSFC
    Horizon View Availability Impact with no WSFC
  5. With WFSC cluster if database server fails, in less than 5 min database failover happens, vCenter / View composer service are not impacted and therefore no impact on view connection server.
  6. vCenter database is heart of vCenter. vCenter database must be highly available.Since view connection server has dependency on vCenter for various operation, unavailability of vCenter will impact vCenter, view composer, view connection server.
  7. WSFC will protect  vCenter services against OS failure, ESXi failure and Service failure.
  8. View composer database will be protected by WSFC. View composer service will not be protected using WSFC. vSphere HA for applications will protect against OS and Service failure. There is potential downtime of 5-10 minutes

    WSFC is complex to configure and manage. Considering the impact it can have on over all availability of solution, efforts in installing, configuring and managing WSFC are worth the efforts. With WSFC, maintenance of all components can be done without having any significant downtime all components are redundant.

    Downstream Dependency Map


    As technical architect we can easily conclude, not having WSFC can bring the entire solution down. If it was just vCenter it wouldn’t need to have WSFC but as dependency on vCenter services increases, your architecture must balance complexity with availability.

Max Number of Virtual Machines and Spare Virtual Machines in Desktop Pools

In last post I discussed about impact of power policies. Well there is one more paramenter which also impacts power policies and spare virtual machines. Dependency is implicitly associated with spare virtual machines however its impact is minimal. However you must know about it. This setting is more relevant to floating pool, non-persistent desktop. And it is has nothing to do with persistent desktop pool? But then Why not persistent? I will discuss this in next post. ok, let me explain the various parameters you have to configure while creating desktop pool.





Max number of machines

Maximum number of virtual machines which will be provisioned in the pool. It is the limit. After this limit, you cannot create any further desktops. E.g. if you configure 10 as Max number of machines, no more than 10 virtual machines will be created. However whether all 10 VMs will be created upfront or on demand is configurable using 3rd & 4th parameter explained below. Maximum 10 users will be able to use 10 desktop concurrently.


Number of Spare (powered on) machines

At any given point of time how many VMs should be always available. If I configure 2, 2 VMs will be always powered on. What does this mean to user experience? 2 VMs are available for user to login, without any delay. This delay can described as

1.      Time required for VCS to send command to VC to create VM

2.      Time required for VC to linked clone VM

3.      Time required for VC to customize VM

4.      Time required for VC to gracefully shutdown VM

This adds huge delay (esp. Task 2 & 3).  If VCS is unable to allocate desktop to end user, he will always get an error message described below. An open question “how many VMs should be configured as spared VM”


Provision machines on demand

When the desktop pool is created i.e. when you press that ‘Finish’ button, you have choice to select how many VMs you want to provision ‘right now’. If I configure value to be 5, 5 VMs will be created & provisioned immediately. For this 5 VM following task will be completed

  1. Time required for VC to linked clone VM
  2. Time required for VC to customize VM
  3. Time required for VC to gracefully shutdown VM

Well when tasks 1 & 2 are completed, 3rd task, gracefully shuts down the VM. When 1st user logs in, VCS sends command to VC to power on 3rd VM. [1]What is 3rd VM? Don’t fear, I’ll explaining the entire process below, hopefully it should be clear.


Provision machines up-front

This is no brainer. It provisions all VMs upfront. E.g. If I selected Max number of machines as 10, all 10 will be provisioned immediately.

Let me explain entire procedure to you below using slides.Below is the first slide which explains how the Automatic pool with Floating assignment is configured. I have configured ’10’ to be Max size of the desktop pool, 5 must be provisioned on demand and 2 must be spare at any given time. Please note this is the configuration screen. I have yet to press ‘Finish’ Buttonimage

  1. When I fresh finish button, following tasks will be completed
    • 5 VMs will be created up front
    • These 5 VMs will be provisioned using Linked clone
    • These 5 VMs will be customized using sysprep/quick prep
    • Out of 5 VMs, 3 will be gracefully shutdown, remaining two VMs will be there as part of Spare VMs


    Below is screen capture from vCenter, when entire process is completed.


    [1]Now when 3rd User logs in, below is state in vCenter.


    Let me first explains what is the term available desktop. Available Desktop in view is defined as when either of below conditions are true

  • View agent is running on the desktop
  • There is no session present on the desktop
  • Desktop is not allocated

Let’s assume all above conditions are true for vd-1-po, vd-2-po desktops. At this stage, Number of Spare (powered on) machines count is 2. Now user-1 logs in, he is allocated vd-1-po. Number of Spare (powered on) machines count is reduced to 1, as result VCS01 sends command to VC to power on vd-3-po to maintain spare VM count. Below table explains entire process till 4th user is logged in.


Sessions view in view administrator console


Please note when 4th user logs in, 6th VM is created, powered on, provisioned and customized. After 5th user is logged in, there will be delay in getting desktops to the user and user will get error mentioned below till VM ready for allocation


When 8th user logs in,  9th , 10th VMs are created, powered on, provisioned and customized


Below is what is observed in vCenter



In above screen we see all 8 sessions are using desktops, 2 desktops are available as spare desktops


There are lot of unanswered questions which I’m leaving it for next post. Length of post is already big. In future post I aim to address

  1. How many spare Desktops we should configure
  2. Why persistent desktop pool is less impact by it.
  3. Which use cases support configuration of spare VMs

VCS: View Connection Server

VC: vCenter

Power Policies for Desktop Pools

I recently finished reading VCDX Boot Camp: Preparing for the VCDX Panel Defense. This is second time I’m reading the book. It is great refresher and as Technical Architect (which is my current role) aids me a lot in gathering requirements and how to validate the requirements of clients by focusing discussion around it. It has helped me rewiring brain. Made me think in terms on AMPRS*. Good Read (5 stars). Angelo Luciani wrote a very blog on chrome extension. I came across Instapaper chrome extension and started using it. Very helpful. As result i bookmarked some great post. During week reading, I read a very good blog post on Risk Management. Please read it, if you are aspiring to be Architect. Risk management is so crucial in decision making and how frequently we miss it. Weekend reading, i came another interesting post on Assumption by Simon Long. I was surprised on what we do not Assumption in our design documents. I have seen it in every decision document. Assumption must be either validated or posted as a Risk if it cannot be validated.

Point is my blog would probably never post on “How to” anymore. It will be always “Why”. Question “Why” is important question to ask yourself and client. Hope I haven’t bored you with it. Lets talk about subject of this post.

I was looking at Power policies we configure while creating Desktop pool. I felt it is most ignored/not so discussed settings. When I went through, I felt it needs to be visited. Hope the post allows you make decision on why to select particular power policies. There are four policies from which you can choose from. These  policies controls what happens to the desktop when user logs off from the desktop


Following are the 4 policies you can choose from

  1. Take No Power Action
  2. Ensure Virtual Machines are always Powered On
  3. Suspend
  4. Power of

Below I have tabulated all four settings in details with description, behavior of policy and its impact


Power Policy

1.    Take No Power Action

Behavior of Policy

As policy aptly named “Take No Power Action”, View connection server (VCS) do not control the state of VM. When End user change the power state of VM, state remains unchanged with small exception. Exception is when user logs off from VM, VM state remains powered on. However when View administrator initiates recompose operation, VM will be powered off. So power state is changed. VM remains powered off even after recompose operation is completed. 

Case 01: If user is assigned a dedicated desktop. User logs off end of his day as he is going on vacation. Recompose operation occurs while he is on vacation, as result VM is powered off. Till he returns, VM remains powered off. He has to log a call to helpdesk to see why his VM is not reachable. Log in time + Application launch time (e.g. AV if it is agent based) is increased. Uptime is low and user experience is negatively impacted.

Case 02: If user shuts down his VM end of this day. VM is powered off. Next day he logs in, VM is powered on. Again Log in time + Application launch time (e.g. AV if it is agent based) is increased.

Use Cases:

  •         Those users who need shutdown privileges. (Shutdown privilege is never given to end user without good justification)
  •         Dedicated desktop users might need shutdown privileges.
  •         Where desktops uptime is little concern


·         If there are any agents which are pulling data from desktop e.g. system management will be impacted.

·         Administrator overhead and IT helpdesk might be overloaded with tickets post recompose operations to power on VMs

·         Impact can be reduced by keeping minimum of VMs powered on to half the size of pool

·         Suitable for dedicated desktop when the % of VM is less e.g. 10% of overall population. Out of 100, if 10 VMs need to powered on, it is a little operation effort. If count goes beyond 50, effort needs to be estimated.


2.    Ensure machines are always powered on

Behavior of Policy

VM always remains power on irrespective of user action. Post recompose operation, VMs are automatically powered on. E.g. If user shuts down the VM, VM is automatically restarted by vCenter. And during recompose operation, VM is automatically powered off and post recompose operation VM is automatically powered ON

Use case:

·         Highly effective for shift workers. In shifts, 24 x 7 desktops will be consumed.

·         If dedicated desktop count is high e.g. above 50 desktops


·         VM always remains powered on, there is no boot storm. Therefore reduced log in time as VM is always powered on.

·         For dedicated desktop pool Administrative effort is reduced. Administrator might spend time to identify and power on VMs

·         Shutdown privilege is ineffective. Shutdown privilege must not be given.

·         For linked clone pool, there is little need to keep spare VM powered on as VMs are always available

·         Potential Resource wastage if all VMs are powered on without being used.



3.    Suspend

Behavior of Policy

VM is suspended when user logs off

Use Cases:

·         Compute resource needs to saved i.e. power needs to be optimally utilize but login time must not be increased


·         IOPS needed to suspend and resume VM will be high. Additional IOPS requirements needs to be consider

·         Once suspend, file is created it remains in VM parent directly till VM is powered off. Extra storage needs to be calculated.


4.    Power off

Behavior of Policy

VM is powered off, when user logs off

Use Case:

·         Disk space is constrained. During power off operation, independent disk will return to pristine state.

·         Desktops are consumed only during 9-6 operations

·         Power must be optimally used


·         Boot storm + login storm every morning. Impact is directly proportional to number of concurrent desktops logins. Can be mitigate if user login time is staggered.

·         Post recompose operation, Administrator have to manually power on all VMs. Can be mitigate using powercli script. Efforts and skills would be needed to develop script

*AMPRS :- Availability, Manageability, Performance, Recoverability and Security

View Connection Server Tags

When I read it first, I thought I got it but when I attended question around it I failed. Glad that I failed, at least I learnt it thoroughly now. By default all desktop pool are available from all connection servers. e.g. User can access desktop as long as he is entitled to at least one desktop pool and connected any connection server. View connection server tags is the way to restrict a desktop pool to particular view connection servers. As part of security requirement you can restrict view connection server for particular set of desktop pools.

In figure below end user-A is entitled to Desktop pool 01. Connection server 01 is configured with connection tag as Gold. While creating desktop pool we associate connection server 01’s Gold tag with it. When end user selects connection server 01, he is able to go to desktop pool 01 as Gold Tag of connection server matches with Gold tag on desktop pool. But when he select connection server 02/03 he is not able to connect to desktop pool as connection server do not have tag associated with it. Connection server denies the access


Case 02 : User-C is entitled to Desktop Pool 02, user selects connection server 01 but he is denied access as Connection Server 01 only grants access to desktop Pool 01. So User C selects connection server 02/03, since no tag is defined on connection server 02/03 user-C is authenticated and presented Desktop Pool 02.

Simple Tip: Think connection server as Ticket collector who helps/advice you in finding seat (desktop). He will see the ticket (tag) and try to allocate the seat (desktop pool). If ticket (tag) is matching seated is granted. If ticket(tag) is not matching, you are not allowed in i.e. access is denied. Similarly when User-A tries to use Connection Server 02/03 there is no tag created on connection server i.e. there is not ticket collector, therefore there is no one to guide you to Desktop Pool 01. But when User-C logs into connection 02/03 he directly given desktop in desktop pool 02 as there is not tag which needs to be validated by ticket collector (Connection Server Tag)

Connection Tag Internet Facing users

View Security servers must be paired with connection servers. If view security servers are behind load balancer, you must ensure security server is paired with connection server where the tag is defined. In below example I have created tag on connection server 01 and connection server 03. In case either of security server01 or 02 fails, user which are entitled to desktop pool 01 are able to login. It makes Fully redundant solution. tag01_SecurityServer

Hope you find this useful. I can’t believe as started drafting this post I learnt lot many other things around connection servers and tag. Keep writing & posting. Happy Monday