Monday, August 29, 2011

LVM Snapshot theory


LVM snapshots are meant to capture the filesystem in a frozen state. They are not meant to be a backup in and of themselves. They are, however, useful for obtaining backup images that are consistent because the frozen image cannot and will not change during the backup process. So while you won't use them directly to make long-term backups, they will be of great value in any backup process that you decide to use.
When LVM implements a snapshot, there are several things that happen, all very quickly. The first is that a new logical volume has to be allocated. The true purpose of this volume is to provide an area where deltas (changes) to the filesystem are recorded. This allows the original volume to continue on without disrupting any existing read/write access. The downside to this is that the snapshot area is of a finite size, which means on a system with busy writes, it can fill up rather quickly. For volumes that have significant write activity, you will want to increase the size of your snapshot to allow enough space for all changes to be recorded. If your snapshot overflows (fills up) both the original volume and the logical volume grind to a screeching halt. This is done to prevent filesystem corruption, although it has the effect of unceremoniously dropping any I/O (and therefore any programs) that were using the original volume. Should this happen, you will want to release your snapshot as soon as possible so you can get the original volume back online. Once the release is complete, you'll be able to remount the volume as read/write and make the filesystem on it available.
The second thing that happens is that LVM now "swaps" the true purposes of the volumes in question. You would think that the newly allocated snapshot would be the place to look for any changes to the filesystem, after all, it's where all the writes are going to, right? No, it's the other way around. Filesystems are mounted to LVM volumenames, so swapping out the name from underneath the rest of the system would be a no-no (because the snapshot uses a different name). So the solution here is simple: When you access the original volume name, it will continue to refer to the live(read/write) version of the volume you did the snapshot of. The snapshot volume you create will refer to the frozen (read-only) version of the volume you intend to back up. A little confusing at first, but it will make sense.
All of this happens in less than 2 seconds. The rest of the system doesn't even notice. Unless, of course, you don't release the snapshot before it overflows...
When you go to release a snapshot, it then takes all of the recorded changes and "replays" them back into the original volume. This takes a bit longer - depending on how much data was recorded - but won't significantly impact what you're doing. Once the release is complete, the snapshot volume is destroyed and the original remains. This is also why it is important to not "hold onto" snapshots - the data needs to be re-integrated back into the original "live" volume at some point.
There is a third, older version of a snapshot from the LVM version 1 days, that is read-only. This form of snapshot does not record changes to the filesystem. I do not recommend pursuing this as a long-term backup strategy. You are still hosting data on the same physical drive that can fail, and recovery of your filesystem from a drive that has failed is no backup at all.
So, in a nutshell:
Snapshots are good for assisting backups
Snapshots are not, in and of themselves, a form of backup
Snapshots do not last forever
A full snapshot is a BAD thing
Snapshots need to "go home" at some point (data needs to be re-integrated)
LVM is your friend, if you use it wisely.