The role of snapshots and journaling in strong data protection

What are they for?

A key part of a Backup and Disaster Recovery solution is point in time recovery. This means you can recover data from certain periods in the past. This is crucial when recovering from a failure, like database corruption, a virus or user error. There are also data centre issues to consider, such as power loss or hardware failure.

Being able to recover quickly can mitigate otherwise serious consequences for a business.

The question any user should ask about point in time recovery is, how is it achieved? Enter journaling and snapshots.*

*We could write myriad articles about the different types of snapshot and journal technology, but for the sake of being concise we’ve kept it to the most important bits.

What’s the difference between the two?

Not unlike a person’s physical diary, journaling keeps track of changes over a period of time. When directory updates happen, they are written to a serial log on disk. If there is a system failure or crash, a journal file system means you can restore the data as it was pre-failure. It also recovers unsaved data and stores it in the original intended destination before the crash.

A snapshot is a ‘screengrab’ of data at a point in time. They happen at Virtual Machine (VM) level or at Storage Area Network (SAN) level. In the event of a failure, that data can be restored from the most recent snapshot taken before the failure. This is useful when a user wants to restore data from a specific time.

Snapshots

Snapshots have a key place in a resilient backup solution.

Short-term snapshots are great for dealing with user errors and some data corruption scenarios.

They are fast (often data can be restored in seconds) and tend to be space-efficient.

VM level snapshots are created in the hypervisor, which affects performance. So, it makes sense to have VM level snapshots running outside of working hours. But this can create scheduling conflicts between backup and replication schedules of the same VMs.

Storage level snapshots affect performance less. But, they still require processing power in a storage controller. At scale, this can also degrade performance.

Plus, if the entity on which your snapshot is stored fails, then it is lost. For protection against media backup failure, there must be a separate, physical copy.

An overuse of snapshots at VM level can severely affect performance, particularly across a large number of VMs. Using snapshots for multiple points in time the impact is a severe performance degradation in a failover scenario that results in a Recovery Time Objective of many hours, if not days, to full performance.

So, snapshot technology is an effective solution when used correctly.

Journaling

Replication software stores data in more than one site, or moves it from one site to another. So if one site fails, the data is still accessible. Journaling is a key part of the replication process.

To use a similar analogy to how snapshots work, journaling is also a bit like taking a photo. However, rather than a single ‘grab’, a photo is taken of every single layer that changes within the data – every file name change or movement, a record update, creation of a new user profile etc. is captured.

By using journal technology instead of VM level snapshots, journal based protection of VMs is scalable to thousands of VMs. Snapshot based replication should be kept to a handful of VMs at other offices at most.

If you use a replication engine that uses journal based protection for point in time recovery, then all of these challenges are resolved as no snapshots are used on the replica VM and therefore degrade performance.

When selecting a backup or DR solution, it’s key to keep in mind that over-use of snapshot technology is most effective as a supplement to journaling.