CDP (Continuous Data Protection)

Continuous data protection (CDP), also called continuous backup or real-time backup, refers to backup of computer data by automatically saving a copy of every change made to that data, essentially capturing every version of the data that the user saves. It allows the user or administrator to restore data to any point in time.

CDP runs as a service that captures changes to data to a separate storage location. There are multiple methods for capturing the continuous changes involving different technologies that serve different needs. CDP-based solutions can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, mail boxes, messages, and database files and logs.

Traditional vs CDP

Continuous data protection is different from traditional backup in that it is not necessary to specify the point in time to recover from until ready to restore. Traditional backups only restore data from the time the backup was made. Continuous data protection has no backup schedules. When data is written to disk, it is also asynchronously written to a second location, usually another computer over the network. This introduces some overhead to disk-write operations but eliminates the need for scheduled backups.

Continuous vs Near-Continuous

Some solutions marketed as continuous data protection may only allow restores at fixed intervals such as one hour or 24 hours. Such schemes are not universally recognized as true continuous data protection, as they do not provide the ability to restore to any point in time. These solutions are often based on periodic snapshots, an example of which is CDP Server, disk-based backup software that periodically creates restore points using a snapshot and volume filter device driver to track disk changes.

There is debate in the industry as to whether the granularity of backup must be “every write” to be CDP, or whether a solution that captures the data every few seconds is good enough. The latter is sometimes called near continuous backup. The debate hinges on the use of the term continuous: whether only the backup process must be continuous, which is sufficient to achieve the benefits cited above, or whether the ability to restore from the backup also must be continuous. The Storage Networking Industry Association (SNIA) uses the “every write” definition.

Differences from RAID, Replication, Mirroring

Continuous data protection differs from RAID, replication, or mirroring in that these technologies only protect one copy of the data (the most recent). If data becomes corrupted in a way that is not immediately detected, these technologies simply protect the corrupted data with no way to restore an uncorrupted version.

Continuous data protection protects against some effects of data corruption by allowing restoration of a previous, uncorrupted version of the data. Transactions that took place between the corrupting event and the restoration are lost, however. They could be recovered through other means, such as journaling.

Backup Disk Size

In some situations, continuous data protection requires less space on backup media (usually disk) than traditional backup. Most continuous data protection solutions save byte or block-level differences rather than file-level differences. This means that if one byte of a 100 GB file is modified, only the changed byte or block is backed up. Traditional incremental and differential backups make copies of entire files.

Risks and Disadvantages

The protection afforded by continuous data protection is often heralded without consideration of the disadvantages and challenges that it can present. Specifically, the continuous bandwidth usage can adversely affect network performance, especially in operations where file sizes are large, such as multimedia and CAD design environments. To mitigate this risk, companies employ throttling techniques that prioritize network traffic to reduce the impact of backup on day-to-day operation.

Related Articles