Storage Spaces Direct – Under the hood with the Software Storage Bus

Note: this post originally appeared on https://aka.ms/clausjor by Claus Joergensen.

Hello, Claus here again, this time at 30,000 feet on a plane going back to Denmark for my dad’s 80^th birthday. I think it is time that we explore some of the inner workings of Storage Spaces Direct (S2D) – it is much more exciting that any movie in the entertainment system. We are going to look at the Software Storage Bus, which is the central nervous system of Storage Spaces Direct. If you don’t already know what Storage Spaces Direct is, please see my blog post introducing Storage Spaces Direct.

Software Storage Bus introduction

The Software Storage Bus (SSB) is a virtual storage bus spanning all the servers that make up the cluster. SSB essentially makes it possible for each server to see all disks across all servers in the cluster providing full mesh connectivity. SSB consists of two components on each server in the cluster; ClusPort and ClusBlft. ClusPort implements a virtual HBA that allows the node to connect to disk devices in all the other servers in the cluster. ClusBlft implements virtualization of the disk devices and enclosures in each server for ClusPort in other servers to connect to.

Figure 1: Windows Server storage stack with the Software Storage Bus in green.

SMB as transport

SSB uses SMB3 and SMB Direct as the transport for communication between the servers in the cluster. SSB uses a separate named instance of SMB in each server, which separates it from other consumers of SMB, such as CSVFS, to provide additional resiliency. Using SMB3 enables SSB to take advantage of the innovation we have done in SMB3, including SMB Multichannel and SMB Direct. SMB Multichannel can aggregate bandwidth across multiple network interfaces for higher throughput and provide resiliency to a failed network interface (for more information about SMB Multichannel go here). SMB Direct enables use of RDMA enabled network adapters, including iWARP and RoCE, which can dramatically lower the CPU overhead of doing IO over the network and reduce the latency to disk devices (for more information about SMB Direct go here). I did a demo at the Microsoft Ignite conference back in May showing the IOPS difference in a system with and without RDMA enabled (demo is towards the end of the presentation)

Software Storage Bus Bandwidth Management

SSB also implements a fair access algorithm that ensures fair device access from any server to protect against one server starving out other servers. It also implements an algorithm for IO prioritization that prioritizes Application IO, which usually is IO from virtual machines, over system IO, which usually would be rebalance or repair operations. However, at the same time it ensures that rebalance and repair operations can make forward progress. Finally, it implements an algorithm that de-randomizes IO going to rotational disk devices to drive a more sequential IO pattern on these devices, despite the IO coming from the application (virtual machines) being a random IO pattern.

Software Storage Bus Cache

Finally, SSB implements a caching mechanism, which we call Storage Bus Cache (SBC). SBC is scoped to each server (per node cache) and is agnostic to the storage pools and virtual disks defined in the system. SBC is resilient to failures as it sits underneath the virtual disk, which provides resiliency by writing data copies to different nodes. When S2D is enabled in a cluster, SBC identifies which devices to use a caching devices and which devices are capacity devices. Caching devices will, as the name suggest, cache data for the capacity devices, essentially creating hybrid disks. Once it has been determined if a device is a caching device or a capacity device, the capacity devices are bound to a caching device in a round robin manner, as shown in the diagram below. Rebinding will occur if there is a topology change, such as if a caching device fails.

Figure 2: Storage Bus Cache in a hybrid storage configuration with SATA SSD and SATA HDD

The behavior of the caching devices is determined by the actual disk configuration of the system and outlined in the table below:

In systems with rotational capacity devices (HDD), SBC will act as both a read and write cache. This is because there is a seek penalty on rotational disk devices. In systems with all flash devices (NVMe SSD + SATA SSD), SBC will only act as a write cache. Because the NVMe devices will absorb most of the writes in the system, it is possible to use mixed-use or even read-intensive SATA SSD devices, which can lower the overall cost of flash in the system. In systems with only a single tier of devices, such as an all NVMe system or all SATA SSD system, SBC will need to be disabled. For more details on how to configure SBC, please see the Storage Spaces Direct experience guide here.

SBC creates a special partition on each caching device that, by default, consumes all available capacity except 32GB. The 32GB is used for storage pool and virtual disk metadata. SBC uses memory for runtime data structures, about 10GB of memory per TB of caching devices in the node. For instance, a system with 4x 800GB caching devices requires about 32GB of memory to manage the cache in addition to what is the for the base operating systems and any hosted hyper-converged virtual machines.

I hope you enjoyed reading this as much as I enjoyed writing it. I still have a couple of hours left on my flight, maybe I should try and catch some sleep. Until next time.

Storage Spaces Direct – Under the hood with the Software Storage Bus

Software Storage Bus introduction

SMB as transport

Software Storage Bus Bandwidth Management

Software Storage Bus Cache

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112