Wednesday, July 28, 2010

Why virtualize servers? 10 reasons.

People who didn't experience the benefits of virtualization often ask what's the big deal. The often repeated argument about consolidation is flawed: after all, instead of running 10 VMs, each serving a single app, you can have one multi-purpose server. Having only one system to maintain means less work. True? Not quite. There are reasons for separating applications in either physical or virtual computers. The first option is clearly too expensive in so many ways (not only initial costs, but also power, cooling, space, maintenance) and VMs even have some advantages over real computers.

1. Isolation

I believe it is possible to have a machine serving HTTP, FTP, DNS, SMTP, POP, IMAP, 2 types of SQL and 5 types of application servers in addition to doing its own resource monitoring and intrusion detection. I used to have boxes doing 80% of that and obviously I could have just installed another app. But the whole setup is barely maintainable. Debugging a problem becomes extremely difficult, when you have to filter out all the irrelevant information. Upgrades are nightmare. There's no way of writing a comprehensive SELinux policy. And worst of all, one misbehaving app using to many resources can bring the whole system to it's knees. Yes, you can impose some limits by PAM, but not on all the resources and anyway, you can't predict anything, especially when customers are allowed to run their applications.

2. Security

On a multi-purpose server, a security hole in one application exposes all the others. An attacker who gained access using an exploit on a web application can than read your mail or alter your DNS zones. All the modern operating systems have some ways of preventing that. Applications can be isolated by running with different UIDs or using chroot jails. That has never been enough to stop the intruders. After an intrusion, you usually have no option but to reinstall the machine from scratch or at least restore a backup and hope it doesn't contain any rootkits.

A successful attack on a virtual server is much less devastating. If you followed the advice of having a single-purpose machine, there are no other applications on that system, no tools that malicious user can use to his advantage. If you want to investigate, suspend the VM and resume it on another machine without a network connection. Restoring the attacked machine is a matter of minutes (you DO have snapshots of your VMs, right?).

And that's provided you didn't avoid an intrusion in the first place. Single purpose machines are way easier to harden. You can check which binaries are run and which files are read and disallow everything else with ACLs. If you know it's only going to run a web application, just filter all the ports except HTTP.

3. Dealing with software incompatibility

Sometimes it's impossible to run 2 apps on the same machine. They can rely on a different version of a library or other component. Or they both need to use the same TCP port. Commercial apps on an open-source system tend to cause problems. They are less configurable, often expect to be installed in a specific place, sometimes require a specific OS distribution. It's easiest to have a separate system for them. There are workarounds, but they make your setup even more messy, hard to test and to maintain.

4. Maintaining legacy systems

Another issue are legacy servers running in many companies. Nobody dares to touch them, because the last guy who knew how they work retired last century. They run a different operating system that you never even heard of. They have small hardware requirements, but where you can get replacement parts for that 486? These machines are prime candidates for virtualization.

5. Easier software upgrades

Now that you have single-purpose machines, upgrading became much easier, but that's not the end of good news. You don't have to upgrade your critical business server in-place. Just create a new virtual machine, install a new version and test it thoroughly. When it's done, switch the virtual plug.

6. Easier hardware upgrades

Upgrading hardware has never been easier. Hypervisor exposes the same set of devices on every box (unless you configured it to give a VM a direct access to a specific hardware, which you shouldn't do without a good reason), that means no worries with the device drivers. When you need more processing power, plug another server and migrate some VMs from the old ones. Popular hypervisors Xen and VMWare allow a live migration with almost none downtime (usually below 100 ms).

7. High availability

HA installations used to be found only in the richest companies and even there only for the mission-critical servers. After all, you had to double all your hardware, also doubling your power, cooling and space expenses. Not any more.

With VMS, you can take snapshots. You can live migrate. You can suspend them, at which point they use virtually (pun intended) no resources, but they can be quickly resumed. I think you get the image now: all you need to run a HA installation is some more planning and configuration in the beginning.

8. Dealing with increased load

The peek load of the server is usually orders magnitude higher than the standard load. With traditional server, you had to buy one that can sustain peek traffic (meaning: expensive hardware, more power draw). Or buy one that's right for your day-do-day needs and hope nobody mentions your website on slashdot or digg. VMs, as usual, give you more choices. You can migrate them around your physical servers to balance the load. You can suspend some VMs to give more resources to the others. If your infrastructure is compatible with public clouds (eg. Amazon EC2), you can rent some processing power for a negligible cost.

9. Backup and restore

Backup is always simple, whether in a virtual of physical environment. After some initial configuration, it happens automatically. It's restoring that's difficult.

Restoring a traditional server is a long, manual process. Hours of downtime, during which both IT and financial people develop ulcers. It's also hard to test, which means the test is often postponed indefinitely until a disasters strikes, at which point you find out restore procedure fails. With VMs, you can save a complete snapshot (disk and RAM image) of a running server. Restoring is a matter of copying the image to the right place. Testing restore? Just start another VM.

10. Testing new software

In a typical corporate environment, all machines are already put to a good use. If you need one for a test deployment, it takes a lot patience, paperwork and begging to get it. If you need several machines to test a networking app, your completely screwed. Not any more. Creating a new VM is a matter of minutes. You don't need high performance for a test, which means you can squeeze dozens of VMs on one box.