Earlier this week, I managed to trash this server. Here is an editted email that I sent to Jeff (who came clean with his own goofup) about what happened.
Well… I was an idiot.
It started when I saw that /var was at 98%. I decided to go and get some space. I thought “hey, isn’t that last partition on the first disk free?”, and I checked parted, (which had it marked “lvm” .. my first clue), but, being the stupid person that I am, I somehow became convinced (without checking) that it wasn’t used. So I did “mkfs” on it.
Now, at this point, things were probably still recoverable. That partition was the last disk in a LVM concatenation and, most likely, didn’t have anything on it. If I had checked, I could have found that it was actually in use by LVM and removed the disk from the concatenation and, again, most likely, everything would have been fine.
But, I didn’t check. I rebooted.
Of course, the logical volume where all the website and mail data is kept didn’t come up. Wonderful.
At first, I thought I might be able to use vgcfgrestore to get the data back. I tried and tried, but I couldn’t find any way to get it back with the last disk missing.
So, I resigned myself to restoring from backup. Problem: No backups since the 28th. So, five days of email and site changes are lost.
But, I couldn’t get amrestore to work. I kept getting the following:
sym53c1010-66-1: SCSI parity error detected: SCR1=3 DBC=110071f1 SBCL=ae st0: Error with sense data: Current st09:00: sense key Aborted Command Additional sense indicates Initiator detected error message received
(which looks like it may be the result of cabling).
Still, I could read the tapes. That is, I could read the headers Amanda put on the tapes, so I knew that at least some of the data was getting too the tape. Luckily, I had used tar (instead of xfsdump) and Amanda tapes have headers that tell you exactly how to extract the data without using Amanda.
At this point, I decided to go to sleep and work on it in the morning.
The next morning, I took the tapes into work where they have a DLT drive and pulled the data off thanks to Amanda‘s design, which makes recovery possible even without having Amanda installed.
Lots of dumb moves on my part throughout this process. I feel like a total idiot.