Virtualized migration
Although virtualization is not a new idea, and its origins can be traced back to the mainframe era, it is only recently that it has become ubiquitous. There are several types of virtualization, but let's focus on platform virtualization and especially on its dynamic aspects (see extended sections of this post). It is being used in highly-efficient data centers and all other places where efficient hardware utilization and high availability matters.Further in the text, I will highlight how live migration of virtual machines makes several real world usage scenarios possible and pose an intriguing question that came to my mind.
Some clarification
Platform virtualization tools (hypervisors), such as VMware Infrastructure, Citrix XenServer, (Linux) Kernel-based Virtual Machine (KVM) or Microsoft Hyper-V, allow running virtualized operating system as a virtual machine (aka guest OS) on a physical-machine operating system (aka host OS). There is a host of interesting use cases (selected ones are described in the next section) where binding a given guest OS to a specific host OS neither recommended or efficient. The sequence of operations that need to be proceeded behind the scenes in order to migrate a selected virtual machine from one physical machine to another without the user noticing downtime is referred to as live migration (aka on-line migration or hot cloning). As opposed to off-line migration (i.e. moving previously shut down VM's to another physical machine), it can be only triggered manually but the necessary operations are performed fully automatically.
Usage scenarios
There are a load of scenarios when live migration should be useful. However, the three most obvious that came to my mind are the following: adaptive data center (aka dynamic data center), physical machine maintenance, and green computing. Let me shed some more light on each of them.
Adaptive data center is a vision of the data center that is capable of adapting its computing power (number of active physical machines) to match current users' requirement. At the times when users run more computation-intensive applications and the overall data center load peaks, additional machines are provisioned immediately to match surge in users' demands. On the other hand, if overall load is low, inactive machines are deactivated. Apart from power consumption savings (see third scenario) there might be other benefits, such as decreased costs of proprietary software licenses.
Physical machine maintenance relates to data centers with hundreds of inexpensive physical machines. Presently, such centers, perform any maintenance operations that require server downtime, such as OS kernel upgrade, within precisely defined maintenance windows (usually at nights or during weekends). If such a data center hosts banking services, then any downtime is unacceptable to customers who are accustomed 24/7 banking. Such situations can be avoided, or at least significantly limited, if virtualization and live migration is used.
Green computing (green IT) is a refreshing idea for all those organizations that either own or operate huge data centers and have to cope with increasing power and cooling costs. If we look at the nodes of a big data center, we may observe that at any moment of time, some nodes are either inactive (i.e. no user jobs are being executed on them) or under-utilized. With the help of live migration it is possible to configure a previously virtualized environment to consolidate load across all available machines. In other words, currently running virtual machines will be automatically redistributed on nodes to ensure that as many nodes as possible are fully utilized. In most cases, some nodes will become inactive and can be effectively powered off. As soon as the load increases, previously deactivated machines will be activated again.
Some pitfalls
After having done brief market research, it seems that nearly all virtualization solutions support live migration. However, only KVM authors explicitly admit that live migration might fail and resuming previous situation is a fall-back option.
Moreover, there are rigid requirements, such as minimal network bandwidth or specific cluster configuration, that must be met in order to migrate quickly, only one vendor writes about them in a precise manner (see Brian's Power Windows Blog post).
I am wondering whether it would be good to test live migration failures (test-to-fail). I am thinking how to test it without pretending my hardware (e.g. hard disk drive) is broken or throttling my network link. Do you have any ideas?
No comments:
Post a Comment