Today I spent a couple hours trying to figure out why my 3rd host won’t connect to the Cisco SFS700D 20GB infiniband switch. vCenter was reporting the connection as down:
I am using the HP Connect X cards (Part Number:448397-B21) in all my host machines. My other two hosts connected without any issues. So this seems to be specific to this card. Possibly a firmware issue? In order to check firmware versions I needed to install Mellanox Firmware Tool (MFT) on the Host(s) found here.
Installing MFT on ESXI Host and check Firmware versions:
- Uploaded both vib files to ESXI01 and ESXI03 and put them in the tmp directory using WinSCP
- Installed both vibs by running (will require a reboot):
12esxcli software vib install -v /tmp/mft-184.108.40.206-10EM-5220.127.116.111820.x86_64.vibesxcli software vib install -v /tmp/net-mst-18.104.22.168-1OEM.522.214.171.1241820.x86_64.vib
- Checked firmware version of HP ConnectX card in ESXI03:
123cd /opt/mellanox/bin./mst start./mlxfwmanager --query
- Compared to ESXI01’s HP ConnectX card:
Updating Firmware Version with MFT:
- cd to the following directory before running the below commands:
- Then needed to find out the PSID by running:
Downloaded the HP OEM firmware 2.6 from here.
- Copied firmware to /tmp/fw-25408-2_6_000-448397-B21.bin on ESXI03
- Before applying the firmware needed the PCI Device Name (/dev/mt25418_pci_cr0). Running the ./mlxfwmanager –query command provided this last bit of information.
- Apply the firmware: (./flint -d [device name] -i [firmware location] burn)
1./flint -d /dev/mt25418_pci_cr0 -i /tmp/fw-25408-2_6_000-448397-B21.bin burn
- Reboot and VOILA!!! Confirmed firmware applied by running the ./mlxfwmanager –query and jump into vCenter to see 20000 Full.
Hope this helps other Infiniband users!