who can you trust your data with? WD? SG? Even Windows is no good.

It’s to the point that I don’t know who I can trust with my data.  A while back I encountered the famous Seagate firmware bug in their barracuda disk but managed to have rescued the disk.  Then recently I put together a new computer, and threw in 2 brand new 3TB drives, one Western Digital Green Caviar, and the other Seagate Barracuda 3TB (wait… didn’t I just have a problem with Barracuda?).  Because both disks were on sale so I thought what the heck, I will just use these two and will be set for a while now.

Yeah, I wish.

After about 3 months, both drives failed almost simultaneously!  I started to hear some scratchy noises, and then all of a sudden, the very last partition on each disk would show up as RAW!  (I partitioned each disk to have 5 500GB partitions and 1 remaining with whatever the size it is).  Paranoid, I ran chkdsk to see if the disks were okay, but to my surprise, chkdsk showed that both disks had hundreds of GB of bad sectors!  Imagine that!  Even if I opened up the disk and used a pencil to draw rings on the disk surface, it would take me a while to screw up hundreds of GB worth of sectors, but they just showed up in them like that.

So I used the advanced replacement program from both brands in order to copy whatever data is left on the old drive to the new one, and then DBAN the old ones and send them back to the companies.  Simple as that, aside from grieving for the lost data.

Again, I wish.

Today I just received the WD 3TB disk.  I stuck it into the Windows7 64-bit machine, but it would only be recognized by the Computer Management Console as having some 746GB, but the BIOS shows the disk as 3TB.  Googled around a bit and it seems many people suffered the same problem, and some of the solutions including updating the driver and Intel Rapid Storage Technology.  I tried to update the driver, but it says the best driver was already install, and my box is an AMD box!  I even tried ASRock’s 3TB+ unlock utility, but still to no avail.  Of course, there are other “solutions” such as getting a PCI/PCI-e disk controller card, but my PCI slots are already occupied.

So how the heck did the previous but failed 3TB disk was recognized as being 3TB just fine but this one would not work? Then it occurred to me, for the previous disk, I stuck it into a Macally SATA enclosure and partitioned and formatted it on a, get this, 32-bit Windows Vista box!  So I took the disk out and stuck it into the enclosure and booted up my 32-bit Vista box.  Lo and behold, it shows the disk as having 2794.39GB! (why not 3TB you ask? Because when the manufacture talks about TB, they are talking about 1000 x 1000 x 1000 x 1000 or 10 to the 12th power, instead of 1024 x 1024 x 1024 x 1024, so if you divide 3,000,000,000,000 which is what manufacture calls 3TB by (1024 x 1024 x 1024) you will get something close to 2793.96GB).

Well I am not claiming that Windows Vista 32-bit is able to recognize the 3TB disk.  In fact, I think all credits should go to the wonderful Macally SATA enclosure.  I could’ve but I didn’t try it on the Windows 7 64-bit machine with the enclosure because formatting the disk takes a long time and I really need to scrape the data off the failed drive.

But the more interesting twist is that, if my memory serves me right, while the Caviar green 3TB drive was partitioned using the enclosure because at that time, I had yet received the new AMD box yet, the Seagate 3TB Barracuda was actually partitioned and formatted inside the box itself.  So I am not sure what the deal was — whether Windows was indeed having trouble with WD 3TB, but worked well with SG 3TB, or the WD 3TB also being in the box influenced Windows to recognize all other 3TB disks.  With the Seagate replacement still on the way, I cannot test it right now.

But one thing is for certain.  I don’t trust neither WD nor SG disks any more.  And based on my experiences, Hitachi disks are not that great either.  So I am not sure which disk I can trust now, since SSDs are still very low in capacity.  Maybe I should just buy a bunch of pencils and paper and start writing down the bits.


case note: resurrecting a bricked Seagate Barracuda 7200.11 disk

A few weeks back, my 1TB Seagate Barracuda 7200.11 disk bricked suddenly.  It happened after a reboot.  The computer stuck on POST trying to detect the disk, and the disk activity LED was on steady, but it just would not recognize the disk.  With SATA power cable connected I could feel the disk spinning inside, which told me it was not a mechanical problem, but the computer just would not recognize it no matter what I did (including putting it into an enclosure and plugging into another computer).  It looked very disturbing at first but after some research it turned out to be a common problem with a bug in the firmware and there is a fix for it.  The bug manifests in 2 symtoms: the LBA=0 problem and the BSY problem.  For the former, the disk is recognized by the computer but it shows the capacity as 0, and for the latter, the computer doesn’t detect the disk at all, which is the case I encountered.

The general steps for fixing the BSY problem are roughly the following:

1. Rig a cable, typically with USB on one end to plug into a computer, and 3 wires on the other end to plug into jumper pins (TX, RX, and GND) on the disk.

2. Loosen the PCB (Printed Circuit Board) of the disk with a Torx 6 screw driver.

3. Insert a non-conductive layer between the PCB and the chip on the disk so they no longer contact.

4. Power on the disk.

5. Use a terminal program on the computer to send a couple of  low level commands to the disk to spin it down

6. While the power is still on, remove the non-conductive layer so the PCB and the chip make contact.

7. Use the terminal program to send a few more commands to the disk to spin it up, and erase the S.M.A.R.T. data.

8. power cycle the disk

9. Use the terminal program to re-create the partition data.

HDD-Parts sells a repair kit for $49.99, but I was hoping for some more affordable methods.  There are several Youtube videos showing how it can be done.  Quite a few of them suggest to modify a Nokia CA-42 cable by cutting off the phone connector end and crimping 3 RS232 connector pins to the wires.  I spent $13 for the cable, $9 for a crimper, and $5 for a RS232 DB9 female connector. While it looked easy on the videos, it didn’t work too well for me.   One problem was all these instructions out there used HyperTerminal as the terminal program, but Microsoft stopped bundling it for Windows Vista and later.  This was not a big deal because puTTY works just equally well.  Another problem was that the RS232 DB9 pins I got were way too big for the jumper pins on the disks to fit snuggly and they easily fell off.  What I really need were “jumper headers” as I found out later, but the local Radio Shack was disappointing (the guy worked there swore they didn’t carry any DB9 crimpers until I grabbed one off the shelf and asked him what it was).  But the bigger problem was that Windows (vista and 7) kept using the Prolific USB-to-Serial driver for the cable.  The name of the driver sounded convincing but I just could not get the communication going.  Worse, the videos also instructed to crack open the USB connector so people could tell which wire is which (TX, RX, GND), but the wires were so thin that after a few moves they broke off from the solder, and I had to toss it into the trash since I didn’t feel like buying a soldering set.

Later someone told me to check out the driver on the mini-CD that came with the Nokia CA-42 cable.  Honestly I didn’t even notice the mini-CD until the cable was in the trash can, so I couldn’t verify that driver worked any better.

Luckily, googling a bit more pointed me to the MSFN forum on how to fix the problem, and since it is a forum, it is interactive, which means there are people who could help when we run into weird situations (many many thanks to jaclaz!!).  The forum also pointed me to a fix kit on eBay for $19.99.  The item listing on eBay recommends to use the VCP (Virtual COM port) driver for the kit, but it was buried under a ton of pictures and I literally missed it until jaclaz pointed it out to me (thanks again!!).

The instructions on the forum are great except for one part.  Instead of steps 3 ~ 6 listed above, it suggests readers to practice how to remove and re-attach the PCB board while the power is on.  It runs a high risk of short-circuiting the PCB because it is very easy to drop these tiny metal screws, and if they fell on the wrong spot, the PCB would become an FCB, fried circuit board.  So DON’T do that.  Take steps 3 ~ 6 instead which is much safer.  The instructions also uses HyperTerminal as the example terminal program, but I will show you how to use puTTY instead.

Disconnect the bricked disk from any power source.  Use the Torx 6 screw driver to remove the PCB board from the disk.  You will see a small chip on the disk.  Cut a strip of anti-static bag, or as the videos suggest, cut a strip of some plastic card (don’t use any paper-based cards because they can be torn easily when you try to pull them out later), and cover the chip, and leave enough leads to the right side of the disk for you to grab and pull later.  Re-attach the PCB board, but don’t tighten these screws too much, especially not on the right side where the non-conductive strip is jammed in between so you can pull out the strip.  You will have 1 extra screw left out because it should go into the middle of the chip and the chip is now covered — don’t lose the screw.

The repair kit was easy to use — I plugged the USB connector to my computer, the 3 wires to the disk jumper pins (make sure you connect GND to GND, TX to RX, and RX to TX).  Windows had trouble finding the driver on its own, but I only needed to point it to the VCP driver I downloaded.  Once the driver is properly installed, log into Windows as an administrative user, go to control panel, Device Manager, and locate the serial port device.  Right click on it, go into properties, and change its baud rate from the default 9600 to 38400.  Also note the COM port it is using.  On mine, it is COM4.

I found it easy to do this with a desktop computer.  An eSATA cable from a laptop doesn’t seem to provide enough current to even power on the disk.  Remove the side panel from the tower case so you have direct access to its SATA power cable.  Plug the SATA power to the bricked disk.  You should be able to feel the disk spinning by slightly lifting it with your hands — there is a certain vibration, and also if you try to turn the disk you can feel a drag due to gyroscopic resistance.

Run puTTY.  On the configuration dialog box, make sure to select the radio button that says “Serial”  [1].  Enter the correct COM port number noted earlier in “Serial line” [2], and 38400 in “Speed” [3].  I highly recommend saving this session by providing a meaningful name in “Saved Sessions” [4] and click the “Save” button [5].  After you have done all that, click on “Serial” under “Connection” in the Category tree on the left side [6].  It shows some options for the serial connection.  Change “Flow control” to “None” [7], and click on “Session” on the left side again [8].  Then click on “Save” button one more time to save the session for later use [5].

Click the “Open” button to open the connection.  You will see a blank window.  If everything was done correctly, pressing “Ctrl-Z”  will show you the prompt.

F3 T>

Type the command below followed by <Enter> to go to level 2

F3 T>/2 <Enter>

And your prompt should now change to

F3 2>

Now you need to spin down the disk by typing the command below, but wait for several seconds before hitting <Enter>

F3 2>Z <wait for several seconds before hitting Enter>

And you should see

F3 2>Z <wait for several seconds before hitting Enter>

Spin Down Complete
Elapsed Time 0.135 msecs <the time may vary here>
F3 2>

People have reported (including me) that if you hit <Enter> too soon after the Z command, you may see some error codes such as:

F3 2>Z <Enter immediately after typing Z>

 LED:000000CE FAddr:00280569
 LED:000000CE FAddr:00280569

One guy has even reported that it is enough to just type Z without even hitting <Enter>, and he just back spaced and erased Z after feeling that the disk spinned down. I didn’t try that.

If you successfully spinned down the disk, you are ready for the most important part: keep the power on to the disk and pull out the non-conductive strip you sandwiched between the PCB and the chip earlier. And tighten all the screws that are already in (so you still have the 1 laying around. Don’t worry about this screw for now). Still be careful at this step because you don’t want to accidentally skid the tip of your screw driver on the PCB to fry it. This ensures the PCB provides enough current to the disk motor so it will spin up correctly.

Use the following command to spin up the disk

F3 2>U <Enter>

and if everything was correct, you should see something like

F3 2>U <Enter>

Spin Up Complete
 Elapsed Time 7.093 secs <the time may vary here>
 F3 2>

I was lucky enough to have encountered a problem at this stage because I didn’t tighten the screws at the first attempt, and the motor wasn’t able to draw enough current.  It gave me the following output:

F3 2>U <Enter>

 Error 1009 DETSEC 00006008
 Spin Error
 Elapsed Time 31.324 secs
 R/W Status 2 R/W Error 84150180

If you encountered that, make sure you have tightened the screws and try again.

Once the disk has been spinned up successfully, change to level 1 using the following command:

F3 2>/1 <Enter>

and your prompt should change to

F3 1>

Now reset the S.M.A.R.T. data using the following command:

F3 1>N1 <Enter>

If everyone was correctly done, it would not output anything and only show you another prompt.  However, because of the loose screws, the motor wasn’t spinned up correctly for me during my first attempt, yet I failed to notice the error messages from the U command.  So when I continued on with the N1 command, I got the following output:

F3 1>N1 <Enter>

Unable to load Diag Overlay

If you see this, STOP.  Power off the disk, re-sandwich the non-conductive strip and START FROM THE BEGINNING.  I was bald enough to go on even after seeing the error message and I will tell you what happened in just a bit.

If erasing S.M.A.R.T. was successful, power off the disk also.  Wait for the disk to completely stop (several seconds), and power it back on.  You need to reconnect a terminal session to the disk, press Ctrl-Z.  Now do the last command to re-create partition data (there are 5 commas between the second “2” and the last “22”):

F3 T>m0,2,2,,,,,22 (enter)

This command takes a while to execute. If everything was right, you will eventually see some output like the following:

Max Wr Retries = 00, Max Rd Retries = 00, Max ECC T-Level = 14, Max Certify Rewrite Retries = 00C8

User Partition Format 5% complete, Zone 00, Pass 00, LBA 00004339, ErrCode 00000080, Elapsed Time 0 mins 05 secs

User Partition Format Successful - Elapsed Time 0 mins 05 secs

Now you have your disk back. Power off the disk. Disconnect the COM cable. Put the last screw back in, make sure all screws are tightened, copy all the data from this disk to another disk, and apply the latest firmware from Seagate (perform this as the last step because updating firmware is risky too, you only want to do this after you have copied the data to another disk).
Because I wasn’t paying attention during my first attempt, I tried to re-create the partition data even when the motor spin-up wasn’t successful, and the disk started to give out a horrible “click click click” noise and I thought my disk was doomed for sure. It turned out to be okay, but I wouldn’t recommend such risks.

In any case, if a step fails, you are more than likely need to power off the disk and start from the beginning. If you are uncertain about something, don’t rush head-on first. Go to MSFN and ask jaclaz and all the good folks there first. We want you to be a happy bunny in the basket.

Looking back, I was debating with myself if I should’ve bought the $49.99 repair kit from HDD-parts.com.  Because I ended up paying something close to that price anyways ($13 + $9 + $5 + $19.99).  But I decided it was a good thing I didn’t because the $49.99 repair kit is only half the story — I may not have found the MSFN forum had it not been the failed attempt with the CA-42 cable, and I would still run into the problems I encountered later and there would be no one helping me in these situations and might have mistakenly thought the disk was not rescueable and give up on it.  Again, many many thanks to jaclaz and other folks!!