Search This Blog

Friday, January 21, 2011

An Interesting Comedy of Errors

If one of the days this week had been a movie then the volume of freakish errors might have been funny... to anyone not in IT.

To me, who is not only in IT but also the person whom the errors occurred to, it was the farthest thing from funny.

For the sake of brevity I will minimize everything. I also hope my misfortune of the day in question brings some perspective on other things that may seem bad to both myself and others.
Also - I hope it amuses those who can find humor in it. I'm sure someday I will be able to find amusement in it.

The story begins with the knowledge that there is one, singular individual in my place of employment whom I cannot afford to have anything go wrong with their computer. My first interactions with this individual were hostile (from them) and they firmly believe that we're doing just about everything wrong in the technology department here. They don't do this job, but they think we're doing things wrong anyway.

A day earlier this week this user approached me because something was not working on their computer. As it was the end of the day I told them that I would happily fix the issue first thing in the morning the next time I was in their building.
The day of fixing arrived and I went to collect the computer but the user was in a meeting. I had to choose to interrupt the meeting or try to catch the user when the meeting was over. I chose to wait as it seemed the less potentially dangerous mistake to make.
I caught up with the user shortly after their meeting and collected the computer to work on it in the tech office.
I tried every trick I knew and I was unable to correct the problem at hand. This meant a re-image was necessary.
I backed up the user's data using a THOROUGHLY tested and valid backup procedure. Then, before kicking off the imaging process I tested to make sure that the backup had worked correctly. It had. All the data was there and it was valid.
I kicked off the imaging process. It ran, without incident, through to completion.
It is important to note that I have developed a new little application that makes the automated backup and restore process easier to use. To do so it, basically, adds a GUI layer to execute the known-good scripts. Part of the imaging process places this application on the laptop's admin's desktop. I deliberately chose NOT to use this application as I have not tested it in a production environment and I couldn't afford to have ANYTHING go wrong with this process.

I ran the "tried and true" command line script that works perfectly EVERY TIME. I've run it on my own data. I've run it on MANY other users' data. Once I perfected it and put it into production I have had no errors from the script - only errors when the hard disk (either origin or destination) were failing.

The script started normally. It did the things it normally does. Then, halfway through the process, it errors out and the drive unmounts (not the other two partitions of the physical disk, just the one in question) and re-mounted. It re-mounted EMPTY.

That's right: the data was GONE. Some settings (not enough to be useful) and half a file (wouldn't open in the appropriate file) managed to be restored before it crashed. Nothing useful remained.

So I altered my day to finish setting up the workstation so the user could, at least, work on new stuff and set about the process of data recovery.

The first tool I used scanned for a hair over four hours.

In the midst of the recovery process (post four hour scan, into active analysis of the recovered files) the wireless network of that particular building completely died. Totally and completely died. I was posting a tweet and that process stalled. I was loading the the Google search pane and the Google logo of the day actually halted mid-load because of the network dying. I had to stop what I was doing and determine where the problem was and try to rectify it (I did with a simultaneous reboot of the entire wireless infrastructure).

I was then able to return to the task of file recovery. I recovered some files, but I doubt any were the files the user needed. I went to put them on the USB disk, as I promised the user I would and the USB disk I had handy for this purpose wouldn't mount. It wouldn't respond. It is dead. I ended up emailing the files instead.


The second tool I tried to use failed to work at all.

Today I am running the second tool from a different machine in an attempt to reclaim the lost partition. If I can reclaim the lost partition I can not only recover all (most?) of the files lost but also things like saved bookmarks, etc.


It is running now. It has found multiple old partitions. I will have to guess at which one I need to restore. Hopefully I choose wisely.


RESOLUTION - due to oversight this was added nearly four years later - I was able to restore the data.

2 comments:

  1. That sucks! Nothing amusing here, and I am in IT as well. My guess? Not sure what was wrong, but did they have any infected files?

    ReplyDelete
  2. None that I am aware of.
    I'm assuming that I experienced hardware failure.

    ReplyDelete