Tao Of Backup Wailing Wall Story
When is /dev/rStp0 not a tape drive?
I work technical support for a medical practice management software company. As an aid to analysis of the backup logs produced by our preferred backup software (one of the 'supertars'), I created a shell script that parses through the main log file, boiling down each backup into a summary like
Tu0330+!
with color-coding on terminal types that permit. The above tells me that the backup on Tuesday, March 30th was successful, that the Bit-Level Verification (reading the files back off tape, and compared against the copy on disk, to be certain they could be read back exactly as they were written), but that the 'header' file read from the beginning of the archive immediately prior to the write phase indicated that it was the same tape as had been used the same day.
I was running this script on general principles on a site, and the summary indicated that the tape had not been changed in a very long time. I asked the customer about this, and was assured that they rotated tapes religiously. I ultimately determined that somehow /dev/rStp0 had been deleted, and when the backup software wrote to it, rather than writing to the tape, was simply creating a tarball ON THE SAME HARD DRIVE.
The backup software correctly printed out each night that the data was being properly written to /dev/rStp0, and that it verified every bit of it matched up with what was on the hard drive. Everyone involved had every reason to believe the backups were fine. Had they suffered a hard drive crash, it could have been impossible to determine why their allegedly recent backups were months old.
---
I classify this as 'Testing' advisedly, because the description of 'Testing' I see here is to restore from a backup and see if it's any good. That could have been incredibly bad, because it could have destroyed the real data in the process, although it's more likely that it simply would show that the files could be restored just fine, and in really good time! Restoring to a different system would have safely disclosed the ugly truth that the files restored were ancient, but quite correct, if one were observant enough to examine the date stamps on critical data files.
'Coverage' is also relevant, because this system was for some reason excluding the /dev directory. Had it not, the backups would have grown exponentially until they filled the root filesystem and disclosed the truth within days instead of weeks or months.
Monty Harder
Kansas City, USA,
Wed 31-Mar-2004 11:57pm