15 tips to fix the tape Library and tape drive related issues

Follow these tips to monitor and manage the tape resources. Use some of the below explained methods to fix tape drive and tape volume related issues on windows and unix.

Use UNIX and Windows Tape utilities
Windows has a utility called ntutil which can be useful for faulty tape drives, to decide if the problem is with TSM or external. The only difficulty with it is that there is no correlation between TSM devices and Windows devices, you have to work out how to match them. For example, in TSM you call your tape drives DRIVE00 to DRIVE32, and DRIVE28 is faulty. First go to TSM and run the following (the '*' just means you don't have to input the library name). Make a note of the drive serial number.
 Q DRIVE * DRIVE28 F=D

Now open a DOS command prompt and type ntutil. Take option '1 manual test' from the first menu, then take option '20 open' to open the default drive, usually \\.\tape0\. Then take option '63 get tape bus info', and you will see a list of tape drives called TAPE00 to TAPE32. However TAPE28 does not correspond to TSM Drive28, that would be much to easy. Scan down the list looking at the tape serial numbers until you spot your tape, then make a note of it's tape number, in this case it was Tape04. Now take option '1 set device special file' and reply to the prompt with Tape04.

Take option '2 close' to close the previous tape session, then option '2 open' again, and you will open Tape04, otherwise known by TSM as Drive28. If the drive opens up at this point, it's probably OK.
However you can run various commands against your faulty drive once you open it, you can see the list from the menu. '49 enquiry' and '58 device info' can be useful.

Also Read: Restoring damaged Storagepool volumes

Enabling Persistent Binding
When a server is rebooted, the tape drive definitions can change, and this can make the tape paths in both servers and storage agents incorrect. You can prevent this from happening by using Persistent Binding.

In AIX, install the Atape driver. This allows you to rename the tapes in AIX to a standard that suits you, and these names will survive a server reboot.

On Windows, you can get persistent binding if you use Qlogic device adaptors. In Qlogic, bring up the Fibre Channel Port Configuration dialog box, right click on Host Adapter, device or LUN in the HBA tree, then click on Configure in the drop down menu. Select the BIND box, and that will bind each port to its target ID.
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------

Library Sharing
How you define library sharing with TSM depends on whether its a SCSI library or a 3494 library, and whether or not you have a SAN. Library Sharing for SCSI libraries requires that you define the library as follow 

 for the library manager
   DEFINE LIBRARY lib-name LIBTYPE=SCSI SHARED=YES DEVICE=drive-name 
 for the library client
   DEFINE LIBRARY lib-name LIBTYPE=SHARED PRIMARYLIBMANAGER=server-name

Library Sharing for 3494 libraries does not use the library manager/client configuration as described above. It needs the '3494SHARED YES' server option instead. You still need to use separate categories for the different servers otherwise you may end up with two servers having the same private/scratch volume in the library inventory. What 3494 library sharing brings is the ability to define all drives to all the servers sharing the 3494. The TSM server will detect if a drive is available and will retry based on the new retry options that were added in 4.1 (DRIVEACQUIRERETRY and MPTIMEOUT options).


--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------

Moving Tapes offsite
There are two parts to volume movement, updating TSM so its knows what is happening with the volumes; and physically managing the automated library inventory by using checkin/checkout commands.

If you are planning to take tapes offsite to a vault, then the important step is to update the access mode of those volumes to 'OFFSITE'. This tells TSM that it can still so some data processing like Reclamation and Move Data commands, but it will NEVER request a mount of the actual volume (it uses the primary copy instead). Note that only copy pool volumes can be set to 'OFFSITE' - this is because TSM always expects to have its primary pool volumes available (i.e.:mountable). The checkout operation for the offsite copypool volumes is a necessary extra step to get the tapes out of the library inventory.
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------


Freeing up library slots
If your library becomes full, you may need to free up some slots. The trick is to eject older tapes from the library that you are not likely to use for a while, and keep the active tapes in the library. The 'MOVE MEDIA' command gives us the ability to have a combination automatic/manual library. We can 'move' tapes outside of the library to a nearby 'location', but the tapes are still considered as mountable. The distinction is whether they have the media state of 'MOUNABLEINLIB' or 'MOUNTABLENOTINLIB', and this tells TSM whether to ask the robot for to mount the volume or to issue a manual mount request. When processing an manual mount request, you must use the 'Checkin Libvol' command to update the library inventory and tell TSM that the tape is back in the robot (since that is ultimately how the tape gets mounted).

TSM will automatically toggle the volume's access mode from ReadOnly to ReadWrite and back again as it is moved in and out of the library. This is to allow any read operations to proceed (e.g.: restore) and cause a manual mount request, but write operations will not attempt to access the volume.
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------

Reclaiming offsite tapes
You don't have to bring your offsite tapes in to do reclamation. Set your copypool reclamation to a reasonable level, say 60%. TSM knows what files are still valid on offsite volumes that are to be reclaimed. It finds the copies of those files in the primary storage pool (which is still in the library); it moves a scratch tape to the copy pool and copies the files from the primary tape pool to the new copypool tape. The new copy tape is then marked to go offsite, and the old one marked for return.
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------

Re-generating a library definition
Sometimes, especially with TSM servers hosted on Windows, it is necessary to delete and redefine a tape libary. After you remove the tape paths, you have to delete the library itself to remove the TSM server library inventory. You do this with the 'delete library library_name' command. Once you re-define the library and paths, you need to re-generate the library inventory by running the following commands, which must be run in the sequence shown.


    Checkin the scratch volumes

    checkin libvol library_name search=yes checklabel=barcode status=scratch
    Checkin the private volumes
    checkin libvol library_name search=yes checklabel=barcode status=private
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------


Reclaiming tapes that are assigned to another TSM server
Imagine the scenario, you are using one TSM as a Library Manager, called TSM1, with maybe 3 other instances sharing the library. You decommission one of those instances, say its called TSM3, reclaim the physical server, then discover that TSM3 still has a lot of tapes allocated in the Library manager. The data is defunct, it's all been moved to other TSM servers and you want to reclaim those tapes as scratch. You can't change the tape to scratch status from TSM1 as it does not own the tapes. If you check them out and back in again, they are still owned by TSM3. You can't change the owner, so what do you do?

The problem is that TSM1 contains records about these tapes in it's volhist file. You need to delete the volhist record on TSM1 for each tape with this command. Put your own volume names in and be absolutely sure to get the names right or you will delete the wrong tapes!

 DEL VOLHIST TODATE=TODAY TYPE=REMOTE VOLUME=volume-name FORCE=YES

 UPD LIBR library-name volume-name STAT=SCR

--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------

Checkin and Checkout
TSM needs to know where it's tapes are if they are stored in tape libraries and to keep it informed you use CHECKIN and CHECKOUT commands. A tape library has a small compartment usually called an IO station with a door. You place your new cartridges into the IO station then run a checkin command. The syntax of the command varies slightly depending on what type of library you have. The command for a SCSI library is
 Checkin libv Library_name search=bulk checklabel=yes status=scr
 
This will read in all the tapes in the IO station, read the labels and define them to TSM as scratch. To check in a single named tape that contains required data, maybe something you are importing from a different system, try 
 Checkin libv Library_name Volume_name  checklabel=yes  
 
Both CHECKIN and CHECKOUT commands will issue a console message asking you to confirm you are ready. Reply to the message with 'REPLY nn' where nn is the message number. However be aware that the robot will select the first tape from the IO station, it will not scan the IO station for your tape. I've had the Library Manager tell me the IO station was empty apart from a specific tape that I wanted, yet TSM kept selecting an incorrect tape on checkin. Eventually I tried a bulk checkin and then discovered there was four foreign tapes in the IO station. 

If you have lots of tapes to input to TSM you can power your library down, open the main door, then just put the tapes directly into empty libary slots. You then power the library up and run an audit, then a bulk checkin to TSM like this
 CHECKIN LIBV library_name SEARCH=YES CHECKL=B
 
If you have some unallocated tapes in your library, maybe because they were checked out with the REMOVE=NO parameter, and you want them checked in as scratch, use the following command - it compares the TSM database with the library database and will just check in tapes that TSM did not know about and set them to scratch status.
 CHECKIN LIBVOL library_name STATUS=scratch SEARCH=YES CHECKL=B
 
Most of us have several TSM servers that share a physical library. This is done by partitioning the library into several logical libraries. Say you have two servers, TSM1 and TSM2 each with virtual libraries VLIB1 and VLIB2 In this case it is quite easy to transfer data between TSM servers. You run an Export, note the tape used (xyz123) then check it out VLIB1 like this 
 CHECKOUT LIBV VLIB1 xyx123 REMOVE=YES
 
This will place the tape into the IO station. You then log into the Library Manager (the control software for the tape library) and re-assign the tape from VLIB1 to VLIB2. Finally you check it back in with 
 CHECKIN LIBV VLIB2 tapeno  xyz123 STAT=PRI  CHECKL=YES
 
If you find the tape is rejected with an invalid label, try a bulk checkin.
If a volume has been removed from the library, but TSM has not been informed you can clear it from TSM with the command below. Checklabel=no and remove=no means that TSM will do no validation, it just removes it from the database. 
 Checkout libvolume Library_name Volume_name  checklabel=no remove=no


--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------


Check Reuse delay settings
If a tape has no active files left on it, but it still does not become scratch then the ussue might be the reusedelay parm on the copystoragepool.

A tape is not necessarily released for scratch straight away. Imagine the worst has happened, you've had a disaster and have had to restore your database back 48 hours. If tapes have been reused in the past 48 hours, then the database will not be accurate, data will be missing. To prevent this, you have a parameter called REUsedelay. This specifies the number of days that must elapse after all files are deleted from a volume before the volume can be rewritten or returned to the scratch pool. The default value for this is 0, but it may have been set to 5, say, to avoid problems with database rollback. That's one reason why tapes do not get recycled quickly.
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------

Tape Mount and Dismount issues
TSM will keep adding data to a 'filling' tape until it is full. However, it will sometimes mount a scratch tape even if there is a 'filling' tape available for that node. This is because TSM will not wait for a tape that is currently dismounting. The logic is that it is faster to ask for a new scratch tape than to wait while a filling tape is dismounted, stored, retrieved then remounted. There is no easy answer to this feature, except to juggle your KEEPMP and MOUNTRetention values to minimise the risk.
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------

Maxscratch parameter
The name of parameter can be a bit confusing, as it limits the total number of tapes that a storage pool can contain, not the total number of scratch tapes. If your tape pool processing starts failing with insufficient space errors, then one cause can be that the maxscratch limit has been reached. You may have plenty of scratch tapes in your library, but TSM will not use them. The maxscratch sets the maximum number of volumes that can be used by each storage pool.
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------


How to find out how many scratch tapes are available ?
The q vol command will only give you information about storage pool volumes, and so does not report on scratch volumes as they are not associated with s storage pool. You need to use the following SQL
select count(*) as Scratch_count   from libvolumes  where status='Scratch'

TSM thinks a tape contains data, but it is empty
You have a CopyPool volume, which is EMPTY and OFFSITE, but the tape does not change to scratch as normal. You cannot move the data off the tape because it is empty. You cannot delete the tape, because it contains data, not even with the discard data option. The tape needs to be audited, but to do this it must be on-site. recall the tape to your site and run an 'AUDit Volume VolName Fix=Yes'.

Altering the MOUNTABLE state
A volume is empty, but is not in scratch status because the volume STATE is mountablenotinlib. To change the STATE of the volume use the command
MOVE MED vol_name STG=pool_name WHERESTATE=MOUNTABLENOTINLIB

This will move the volume back into the scratch category

0 Comment to "15 tips to fix the tape Library and tape drive related issues"

Post a Comment