Wednesday, July 28, 2010
Why virtualize servers? 10 reasons.
I believe it is possible to have a machine serving HTTP, FTP, DNS, SMTP, POP, IMAP, 2 types of SQL and 5 types of application servers in addition to doing its own resource monitoring and intrusion detection. I used to have boxes doing 80% of that and obviously I could have just installed another app. But the whole setup is barely maintainable. Debugging a problem becomes extremely difficult, when you have to filter out all the irrelevant information. Upgrades are nightmare. There's no way of writing a comprehensive SELinux policy. And worst of all, one misbehaving app using to many resources can bring the whole system to it's knees. Yes, you can impose some limits by PAM, but not on all the resources and anyway, you can't predict anything, especially when customers are allowed to run their applications.
On a multi-purpose server, a security hole in one application exposes all the others. An attacker who gained access using an exploit on a web application can than read your mail or alter your DNS zones. All the modern operating systems have some ways of preventing that. Applications can be isolated by running with different UIDs or using chroot jails. That has never been enough to stop the intruders. After an intrusion, you usually have no option but to reinstall the machine from scratch or at least restore a backup and hope it doesn't contain any rootkits.
A successful attack on a virtual server is much less devastating. If you followed the advice of having a single-purpose machine, there are no other applications on that system, no tools that malicious user can use to his advantage. If you want to investigate, suspend the VM and resume it on another machine without a network connection. Restoring the attacked machine is a matter of minutes (you DO have snapshots of your VMs, right?).
And that's provided you didn't avoid an intrusion in the first place. Single purpose machines are way easier to harden. You can check which binaries are run and which files are read and disallow everything else with ACLs. If you know it's only going to run a web application, just filter all the ports except HTTP.
3. Dealing with software incompatibility
Sometimes it's impossible to run 2 apps on the same machine. They can rely on a different version of a library or other component. Or they both need to use the same TCP port. Commercial apps on an open-source system tend to cause problems. They are less configurable, often expect to be installed in a specific place, sometimes require a specific OS distribution. It's easiest to have a separate system for them. There are workarounds, but they make your setup even more messy, hard to test and to maintain.
4. Maintaining legacy systems
Another issue are legacy servers running in many companies. Nobody dares to touch them, because the last guy who knew how they work retired last century. They run a different operating system that you never even heard of. They have small hardware requirements, but where you can get replacement parts for that 486? These machines are prime candidates for virtualization.
5. Easier software upgrades
Now that you have single-purpose machines, upgrading became much easier, but that's not the end of good news. You don't have to upgrade your critical business server in-place. Just create a new virtual machine, install a new version and test it thoroughly. When it's done, switch the virtual plug.
6. Easier hardware upgrades
Upgrading hardware has never been easier. Hypervisor exposes the same set of devices on every box (unless you configured it to give a VM a direct access to a specific hardware, which you shouldn't do without a good reason), that means no worries with the device drivers. When you need more processing power, plug another server and migrate some VMs from the old ones. Popular hypervisors Xen and VMWare allow a live migration with almost none downtime (usually below 100 ms).
7. High availability
HA installations used to be found only in the richest companies and even there only for the mission-critical servers. After all, you had to double all your hardware, also doubling your power, cooling and space expenses. Not any more.
With VMS, you can take snapshots. You can live migrate. You can suspend them, at which point they use virtually (pun intended) no resources, but they can be quickly resumed. I think you get the image now: all you need to run a HA installation is some more planning and configuration in the beginning.
8. Dealing with increased load
The peek load of the server is usually orders magnitude higher than the standard load. With traditional server, you had to buy one that can sustain peek traffic (meaning: expensive hardware, more power draw). Or buy one that's right for your day-do-day needs and hope nobody mentions your website on slashdot or digg. VMs, as usual, give you more choices. You can migrate them around your physical servers to balance the load. You can suspend some VMs to give more resources to the others. If your infrastructure is compatible with public clouds (eg. Amazon EC2), you can rent some processing power for a negligible cost.
9. Backup and restore
Backup is always simple, whether in a virtual of physical environment. After some initial configuration, it happens automatically. It's restoring that's difficult.
Restoring a traditional server is a long, manual process. Hours of downtime, during which both IT and financial people develop ulcers. It's also hard to test, which means the test is often postponed indefinitely until a disasters strikes, at which point you find out restore procedure fails. With VMs, you can save a complete snapshot (disk and RAM image) of a running server. Restoring is a matter of copying the image to the right place. Testing restore? Just start another VM.
10. Testing new software
In a typical corporate environment, all machines are already put to a good use. If you need one for a test deployment, it takes a lot patience, paperwork and begging to get it. If you need several machines to test a networking app, your completely screwed. Not any more. Creating a new VM is a matter of minutes. You don't need high performance for a test, which means you can squeeze dozens of VMs on one box.