Metrocluster single-disk MPHA Faults

Often, MPHA faults can be easily attributed to a single SAS cable or SAS HBA fault, when an entire shelf or loop appears offline.

What happens when a disk shows up as single path (when the disks surroundind it are multi-path)?

Fault Summary

In this scenario, we have a fabric metrocluster between two sites approx 50km apart. Each site has 2 fibre channel switches connecting the disks to the controllers (and to the DWDM inter site links)

(** Note the outputs are truncated with "..." in the below examples)

FILER01:> storage show disk -p
...
switch11:33.43     B    switch12:33.43     A     2   11
switch11:33.44     B                             2   12
switch11:33.45     B    switch12:33.45     A     2   13
...

In this situation, disk switch11:33.44 only has a single path, whilst the disks before and after it have two paths active.

A quick check on the fibre channel switches show that both WWPNs for the disks are logged in.

FILER01:>storage show disk switch12:33.44
...
WWN: 2:000:b55253:7aeb05
...

switch11:> portshow 33
...
portWwn of device(s) connected:
22:00:b4:55:53:7a:eb:05
(all other disks on this loop will also show here)
...
switch12:> portshow 33
...
portWwn of device(s) connected:
21:00:b4:55:53:7a:eb:05
(all other disks on this loop will also show here)
...

On a side note - if the fibre channel zoning is configured correctly using port based zoning (instead of WWPN), you will not need to reconfigure your FC switches when replacing disks or adding shelves to existing loops. The above simply confirms that the disk has successfully logged into the FC fabric on both paths.

Resolution

It took a while for NetApp to come back with a solution (as this didn't seem to be a standard fault scenario they come across regularly). It turned out that replacing the affected disk drives resolved the fault. Simple :D