I finally had time to attempt configuring vSAN in my lab. I want to compare to HP StoreVirtual. I know its like comparing apples to oranges as they operate completely different:
- VMware’s vSAN – Operates at the kernel level in each host
- HP’s StoreVirtual – Requires VM appliances in each host
Performance wise vSAN should have the upper hand as it runs directly in the kernel but still want to compare the two and see the IOPS difference running in kernel vs in an appliance. We leverage HP’s StoreVirtual (All flash 10TB) for our VDI environment which we haven’t had any issues with.
Before jumping in I checked the vSAN HCL to ensure the cards I plan to purchase were supported. Click here for a detailed break down of my homelab equipment.
I went with LSI 9211-8i (3) HBA controller. After receiving the cards I flashed them with the latest IT firmware/bios (P20) and each host has two Sandisk Ultra II 240GB. With all required components lets begin configuring vSAN:
One additional requirement for vSAN to work is having at least one SSD and one HDD in each host. Since I have all SSDs we need to tag one as a HDD by running the following commands:
- Identify SSD to tag by running the following command:
esxcli storage nmp device list | grep "Local ATA Disk"
- Next we apply rule to disable SSD (no feed back = success):
excli storage nmp satp rule add -s VMW_SATP_LOCAL -d naa.5001b44c8a824825 -o disable_ssd
- Then we reclaim the device in order for rule to take effect (reboot required):
esxcli storage core claiming reclaim -d naa.5001b44c8a824825
- Confirm SSD has been tagged as a HDD:
esxcli storage core device list -d naa.5001b44c8a824825 | grep SSD
- Enable Virtual SAN traffic on vmkernel interface:
- Setup vSwitch or vDS. I chose to go with vSwitch and attached the HP ConnectX card to it:
- Select Cluster (Homelab) -> Manage tab -> Settings -> Virtual SAN – General -> Edit… -> Turn ON Virtual SAN -> Automatic or Manual -> OK:
- I chose Manual to show the issue I am experiencing normally I would go with Automatic.
Select Disk Management -> Select Host (ESXI01) -> Click on -> Select SSD -> Select HDD -> OK – (Repeat for all remaining hosts)
- Verified Disk Group Status and Resources:
Unhealthy Disk Groups in vSAN
I plan to continue troubleshooting this week by performing the following:
- Disable onboard AHCI controller and test
- Remove HP ConnectX cards (show up as Storage adapters & Network adapters) reconfigure vSAN network and test
- Roll back firmware to v19 and test
- Upgrade to vSphere 6 and test
DOH!! Just noticed Sandisk Ultra II is not on the HCL supported list. Thats a bummer….but would think it should still work. Going to attempt the above and see if things workout.
After upgrading my homelab from 5.5 to 6 and tagging one SSD in each host with the capacityFlash. The unhealthy SSD disk issue is resolved. During the upgraded I also cleaned up and removed the old vibs.
Removed all Mellanox drivers before the upgrade:
esxcli software vib remove -n net-ib-cm -n net-ib-core -n net-ib-ipoib -n net-ib-mad -n net-ib-sa -n net-ib-umad -n net-mlx4-en -n net-mlx4-core -n net-mlx4-ib -n scsi-ib-srp
After upgrading to vSphere 6 removed the nmlx4 drivers:
esxcli software vib remove -n nmlx4-en -n nmlx4-core -n nmlx4-rdma
Then reinstalled the Mellanox 18.104.22.168
vSAN is working!