5333 private links
This document is written for administrators and those who have familiarity with computing hardware platforms and storage concepts such as RAID. If you're already versed in the general failure process, you can skip ahead to how to replace a drive and repairing the pool.
Degrees of verbosity
When a drive fails or has errors, a great degree of logging data is available on SmartOS. We can drill down in more detail to help us find the underlying cause of disk failure. In descending order, these commands will present the disk failure cause in increasing verbosity:
zpool status
iostat -en
iostat -En
fmadm faulty
fmdump -et {n}days
fmdump -eVt {n}days
The zpool status command will present us with a high level view of pool health.
iostat will present us with high level error counts and specifics as to the devices in question.
fmadm faulty will tell us more specifically which event led to the disk failure. (fmadm can also be used to clear transitory faults; this, however, is outside the scope of this document. Refer to the fmadm man page for more information.) fmdump is much more specific still, presenting us of a log of the last {n} days of fault events. This information is often extraneous to replacing faulted disks, but if the problem is more complex than a simple single disk failure, it is extremely useful in isolating a root cause.