Monday, August 29, 2011

How to recover a failed MPIO paths from an IBM VIO server on an AIX LPAR

If you have set up disks from 2 VIO servers using MPIO to an AIX LPAR, then you need to make some changes to your hdisks.

You must make sure the hcheck_interval and hcheck_mode are set correctly:

Example for default hdisk0 settings:

# lsattr -El hdisk0
PCM             PCM/friend/vscsi                 Path Control Module        False
algorithm       fail_over                        Algorithm                  True
hcheck_cmd      test_unit_rdy                    Health Check Command       True
hcheck_interval 0                                Health Check Interval      True
hcheck_mode     nonactive                        Health Check Mode          True
max_transfer    0x40000                          Maximum TRANSFER Size      True
pvid            00cd1e7cb226343b0000000000000000 Physical volume identifier False
queue_depth     3                                Queue DEPTH                True
reserve_policy  no_reserve                       Reserve Policy             True

IBM recommends a value of 60 for check_interval and hcheck_mode should be set to "nonactive".

To change these values (if necessary):

# chdev -l hdisk0 -a hcheck_interval=60 -P

# chdev -l hdisk0 -a hcheck_mode=nonactive -P

Now, you would need to reboot for automatic path recovery to take effect.

If you did not set the check_interval and hcheck_mode as described above or did not reboot, then after a failed path, you would see the following even after the path is back online:
# lspath
Enabled hdisk0 vscsi0
Failed  hdisk0 vscsi1

To fix this, you would need to execute the following commands:

# chpath -l hdisk0 -p vscsi1 -s disable

# chpath -l hdisk0 -p vscsi1 -s enable

Now, check the status again:
# lspath
Enabled hdisk0 vscsi0
Enabled hdisk0 vscsi1

chpath -l hdisk0 -p vscsi1 -s disable; chpath -l hdisk0 -p vscsi1 -s enable