Hello, Claus here again. By now, you have probably seen some of my blogs and demos on Storage Spaces Direct performance. One of Storage Spaces Direct’s advantages is RDMA networking support that lowers latency and reduces CPU consumption. I often get the question “Is RDMA required for Storage Spaces Direct”. The answer to this question is: no. We support plain-old Ethernet as long as it’s 10GbE or better. But let’s look a bit deeper.
Recently we did a performance investigation on new hardware, comparing it with an in-market offering (more about that in another post). We ran the tests with RDMA enabled and RDMA disabled (Ethernet mode), which provided the data for this post. For this investigation, we used DISKSPD with the following configuration:
- DISKSPD version 2.0.17
- 4K IO
- 70:30 read/write mix
- 10 threads, each thread at queue depth 4 (40 total)
- A 10GiB file per thread (“a modest VHDX”) for a total of 100GiB
We used the following hardware configuration:
- 4 node cluster
- Intel S2600WT Platform
- 2x E5-2699v4 CPU (22c44t 2.2Ghz)
- 128GiB DDR4 DRAM
- 4x Intel P3700 NVMe per node
- Mellanox CX3 Pro 40Gb, dual port connected, RoCE v2
- C States disabled, OS High Performance, BIOS Performance Plan, Turbo/HT on
- Software configuration
- Windows Server 2016 with January roll-up package
- No cache drive configuration
- 3-copy mirror volume
We are by no means driving this system hard, which is on purpose since we want to show the delta between RDMA and non-RDMA under a reasonable workload and not at the edge of what the system can do.
Metric |
RDMA |
TCP/IP |
RDMA advantage |
IOPS |
185,500 |
145,500 |
40,000 additional IOPS with the same workload. |
IOPS/%kernel CPU |
16,300 |
12,800 |
3,500 additional IOPS per percent CPU consumed. |
90th percentile write latency |
250µs |
390µs |
140µs (~36%) |
90th percentile read latency |
260µs |
360µs |
100µs (28%) |
I think there are two key take-away’s from this data:
- Use RDMA if you want the absolute best performance. RDMA significantly boosts performance. In this test, it shows 28% more IOPS. This is realized by the reduced IO latency provided by RDMA. It also shows that RDMA is more CPU efficient (27%), leaving CPU to run more VMs.
- TCP/IP is no slouch, and absolutely a viable deployment option. While not quite as fast and efficient as RDMA, TCP/IP provides solid performance and is well suited for organizations without the expertise needed for RDMA.
Let me know what you think.
Until next time
Claus