Confusion over EBS Snapshots in Amazon

Question

Every EBS snapshot is a standalone snapshot, which, if restored onto a new volume, will give you a volume that is identical to the volume as it existed at the time of the snahshot.

However, snapshots are stored in S3 and the way they are stored (and the way you are billed for the storage of them) is incremental.

Amazon EBS snapshots are incremental backups, meaning that only the blocks on the device that have changed since your last snapshot will be saved. If you have a device with 100 GBs of data, but only 5 GBs of data has changed since your last snapshot, only the 5 additional GBs of snapshot data will be stored back to Amazon S3. Even though the snapshots are saved incrementally, when you delete a snapshot, only the data not needed for any other snapshot is removed. So regardless of which prior snapshots have been deleted, all active snapshots will contain all the information needed to restore the volume. In addition, the time to restore the volume is the same for all snapshots, offering the restore time of full backups with the space savings of incremental.

^{— http://aws.amazon.com/ebs/}

So behind the scenes, this snapshot contains only the blocks that changed from the prior snapshot... but restoring the snapshot does not mean you have to put the incremental pieces back together. EBS does that for you automatically, all behind the scenes.

So, let's say you have a 100 GB EBS volume, and snapshots A, B, and C, taken in that order, and no other snapshots of the volume.

Snapshot A would be 100GB in size (possibly less, since space you've never written to might be eliminated from the shapshot).

If 20GB changed, then you took snapshot B, that snapshot would be 20GB in size, but if you restored it, the resulting volume would contain the full 100GB, because it has pointers back to the unchanged data from shapshot A.

Then another 10GB changed, and you took snapshot C. That would be a 10GB snapshot, with pointers back to B for the previous data, and pointers back to A for the rest. Again, restoring this one would get you the full volume at the time you took snapshot C.

Now, if you delete snapshot B, the blocks changed in snapshot B but not subsequently changed in shapshot C would roll forward into snapshot C so that you could still do a restoration of the entire volume at the point of snapshot C, and snapshot C would be a 30 GB snapshot.

This is an oversimplification, because it's likely that some of the same blocks would have changed from A -> B and B -> C making the final version of C somewhat smaller than 30 GB but it does convey the general idea. Every snapshot stands alone for restoration purposes, but the inner workings of EBS store only the differences from the prior snapshot, and you only pay storage for the amount of data the snapshot contains. Unfortunately, at this time, there's no way to find out via the API how large each snapshot actually is, because this information isn't exposed... they always show to be the same size as the volume.

There is no way to automatically purge snapshots. For my systems, I have written a script that runs once a day, looking for volumes to snapshot, based on their tags. Then it considers which volumes have sufficient snapshots based on my retention policy, and deletes any other snapshots -- but it will only delete snapshots that it, itself, created, and again this is based on tags that the snapshot script applies to snapshots it creates.