The other day, I had a system drive failure with one of my servers. It was a software mirrored RAID setup (RAID1), where one of those disks have been acting funky for the last months with occasional clicks of death.
During the New Years vacation, it started to really misbehave, causing a complete system lockup. Fortunately it was only two days till I was home. Well at home, I started collecting two old drives which have been used by a RAID-controller before. I disconnected all other drives and RAID-controllers, and connected those two drives and booted the computer with a Gentoo installation disk. The Gentoo installation disk is a great tool for both installing a new Gentoo system and recovering a Gentoo system.
After I configured those two disks with mirroring, partitioning and installed the boot loader, I copied the old system over. I restarted the computer and everything was OK. I shut down the computer to remove the old drives with data. And then I started noticing a pattern. Occasionally the POST screen would stop and halt at “Auto detecting SATA … IDE Hard Drive”. At first I couldn’t figure out what it was, but it only happened after I shut down the computer. If I booted the computer with a Gentoo installation disk, it would load libachi. That library would identify the problematic system drives and spin them up. On subsequent warm reboots BIOS/POST would recognize the drives. Only with a cold reboot (complete shutdown) the drives would not be recognized.
And then I started thinking of the previous stay at the RAID-controller. And this is important. During the stay at the RAID-controller, I configured the disks to do a staggered spin-up, so it wouldn’t drain too much power from the PSU (power supply). It turns out this sets a mode in the disk drive itself, and if you take this disk in a computer without support for it in the BIOS, you’ll get problems.
I learnt the hard way this sets a special parameter in the disk, if it’s supported by the drive BIOS, to power up in standby mode when it gets power. This means the disk will wake up to a stand-by mode.
Luckily Linux comes with a very handy tool called hdparm. Some looking in the man-pages I found this:
-s Enable/disable the power-on in standby feature, if supported by the drive. VERY DANGEROUS. Do not use unless you are absolutely cer‐ tain that both the system BIOS (or firmware) and the operating sys‐ tem kernel (Linux >= 2.6.22) support probing for drives that use this feature. When enabled, the drive is powered-up in the standby mode to allow the controller to sequence the spin-up of devices, reducing the instantaneous current draw burden when many drives share a power supply. Primarily for use in large RAID setups. This feature is usually disabled and the drive is powered-up in the active mode (see -C above). Note that a drive may also allow enabling this feature by a jumper. Some SATA drives support the control of this feature by pin 11 of the SATA power connector. In these cases, this command may be unsupported or may have no effect.
This was exactly what was happening with me.
After using quite some time with this, these two commands solved it:
# hdparm -s 0 /dev/sda # hdparm -s 0 /dev/sdb