5.5 TSM Server-side deduplication Overview

TSM Server-side deduplication process Overview

With this method, deduplication occurs after data is backed up to a storage pool that is set up for deduplication.
  • More data travels over LAN because it backs up data first. Server option DEDUPREQUIRESBACKUP YES
  • Client does not do any of the processing to remove duplicate data.
  • Backs up all data to the server.
  • Became available in version 6.1.0.

Tivoli Storage Manager server-side data deduplication is a two-phase process. 

1) The server identifies the duplicate data in the storage pool.

2) Any of the following processes remove duplicate data:
  • Reclamation of volumes in the primary storage pool, copy storage pool, or active-data pool.
  • Backing up of a primary storage pool to a copy storage.
  • Copying of active data in the primary storage pool to an active-data pool.
  • Migrating data from the primary storage pool to another primary storage pool.
  • Moving data from the primary storage pool to a different primary storage pool that is also set up for deduplication, moving data within the same copy storage pool, or moving data within the same active-data pool.
Also Read: Difference between Server-side and Client-side Deduplication

The server-side deduplication process consists of the following:
  • Data is sent from clients to the server.
  • The server creates extents and pointers to the hash index in the server database to relate files to extents.
  • The Backup Stgpool operation copies data to an undeduplicated copy storage pool.
  • Reclaim operations remove duplicate data extents from the primary storage pool. This operation frees unused space.
  • With this approach, deduplication is performed out-of-band, and at least one copy of non-deduplicated data exists.
Check the below video to learn how to configure TSM Server-side deduplication



Planning for server-side deduplication

You can create a new storage pool for deduplication or you can upgrade an existing storage pool. In either case, Tivoli Storage Manager provides the option of running duplicate-identification processes automatically or manually.
  • Before setting up storage pools for deduplication, determine which client nodes have data that you want to deduplicate.
  • Determine which client nodes use this method. You might want to have some clients use server-side and others use client-side deduplication.
  • Decide whether you want to define a new storage pool exclusively for deduplication or update an existing storage pool.
  • The storage pool must be a sequential-access disk (FILE) pool. Deduplication occurs at the storage pool level. All data within a storage pool, except encrypted data, is deduplicated.
  • Decide how you want to control duplicate-identification processes: Automatically Manually.
  • If you have a primary sequential-access disk storage pool and a copy sequential-access disk storage pool, and both pools are set up for deduplication, you might want to run duplicate-identification processes for the primary storage pool only. In this way, only the primary storage pool reads and deduplicates data.
  • When the data is moved to the copy storage pool that has deduplication enabled, the deduplication is preserved. No duplicate identification is required.
  • If you plan to use Simultaneous Write Migration, that data cannot go to a deduplication-enabled storage pool.
  • Disables AUTOCOPY option on the storage pool. Issues warning message.

Controlling number of deduplication processes

The IDENTIFY DUPLICATES command starts or stops processes that identify duplicate data in a storage pool. When you define or update a storage pool for deduplication, you can specify 0 to 50 duplicate-identification processes to start automatically and run for a duration that you specify. 

Also Read: What is Cloud Container Storagepool ?

You can also control deduplication processing manually to avoid resource impacts during server operations.

To get the best performance for your data deduplication processes, specify an increased number of duplicate identification processes.

You can use additional memory to optimize the frequent access of deduplicate extent information that is stored in the Tivoli Storage Manager database.

Effects of turning deduplication off or on 

You can turn deduplication on or off by updating the storage pool definition.
  • UPDATE STGPOOL DEDUPLICATE=NO
  • UPDATE STGPOOL DEDUPLICATE=YES
If you turn deduplication off for a storage pool:
  • New data that enters the storage pool is not deduplicated.
  • Deduplicated data, stored in the storage pool before you turned deduplication off, is not reassembled.
  • Deduplicated data continues to be removed because of normal reclamation and deletion. All information about deduplication for the storage pool is retained.
If you turn deduplication on again for the same storage pool:
  • Duplicate-identification processes resume.
  • Files that have already been processed are skipped.

Tips for deduplication

To reduce the number of times a volume is opened and closed, multiple input FILE volumes in a deduplicated storage pool can remain open at the same time during a session. To specify the number of open FILE volumes in deduplicated storage pools that can remain open, use the NUMOPENVOLSALLOWED server option. Set this option in the server options file or by using the SETOPT command.

The server processes that read data from a deduplicated storage pool and can be affected:
• Volume reclamation

• MOVE DATA or MOVE NODEDATA

• EXPORT

• AUDIT VOLUME

• Storage-pool restore operation

• Volume restore operation


0 Comment to "5.5 TSM Server-side deduplication Overview"

Post a Comment