Using DFS Replication Clone Feature to prepare 100TB of data in 3 days (A Test Perspective)

I’m Tsan Zheng, a Senior Test Lead on the DFS team. If you’ve used DFSR (DFS Replication), you’re probably aware that the largest amount of data that we had tested replication with until recently was 10 TB. A few years ago, that was a lot of data, but now, not so much.

In this post, I’m going to talk about how we verified preparing 100 TB of data for replication in 3 days. With Windows Server 2012 R2, we introduced the ability to export a clone of the DFSR database, which dramatically reduces the amount of time used to get preseeded data ready for replication. Now, it only takes roughly 3 days to get 100 TB of data ready for replication with Windows Server 2012 R2. On Windows Server 2012, we think this would’ve taken more than 300 days based on our testing of 100 GB of data, which took 8 hours to prep on Window Server 2012 (we decided not to wait around for 300 days). In this blog post, we’ll show you how we tested the replication of 100 TB of data on Windows Server 2012 R2.

First of all, let’s all look at what 100 TB of data could mean: It could be around 340,000 8 megapixel pictures (that’s 10 years of pictures if you take 100 pictures every day), or 3,400 Blu-Ray quality full-length movies, or billions of office documents, or 5,000 decent sized Exchange mailbox files, or 2,000 decent virtual machine files. That’s a lot of data even in the year of 2013. If you’re using 2 TB hard drives, you need at least 120 of them just to set up two servers to handle this amount of data. Now we have to clarify here that the absolute performance of cloning a DFSR dataset is largely dependent on the number of files and directories, not the actual size of the files (if we use verification level 0 or 1, which don’t involve verifying full file hashes).

In designing the test, we not only need to make sure we set up things correctly, but also we need to make sure that replication happens as expected after the initial preparation of the dataset - you don’t want data corruption when replication is being set up! Preparing the data for replication also must go fast if we’re going to prep a 100 TB of data in a reasonable amount of time.

Now let’s look at our test setup. As mentioned earlier, you need some storage. We deployed two virtual machines, each with 8GB RAM and data volumes using a Storage Spaces simple space (in a production environment you’d probably want to use a mirror space for resiliency). The data volumes were served by a single-node scale-out file server, which provided continuous availability. Hyper-V host (Fujitsu PRIMERGY CX250, 2.5Ghz, 6cores, 128GB RAM) and file server (HP Mach1 Server – 24GB, Xeon 2.27GHz - 8 Core) were connected using dual-10GbE network to ensure near local performance IO-wise. We used 120 drives (2TB each) in 2 Raid Inc JBODs for the file server.

In order to get several performance data points from a DFSR perspective (as DFSR uses one database per volume), we used following volume sizes that total 100 TB on both ends. We used a synthetic file generator to create ~92 TB of unique data; the remaining 8 TB was human-generated data harvested from internal file sets. It’s difficult to have that much real data...not counting VHDx files and peeking into personal archives, of course! We used the robocopy commands provided by DFSR cloning to pre-seed the second member.

Volume	Size	Number of files	Number of folders	Number of Replicated Folders
F	64 TB	68,296,288	2,686,455	1
G	18 TB	21,467,280	70,400	18
H	10 TB	14,510,974	39,122	10
I	7 TB	1,141,246	31,134	7
J	1 TB	1,877,651	7,448	1
TOTAL	100 TB	107,293,439	2,834,559

In a nutshell, following diagram shows the test topology used.

Image may be NSFW.
Clik here to view.

Now that storage and file sets are ready, let’s look at what verification we did during Export -> Pre-seed -> Import sequence.

No errors in the DFSR event log. (From Event Viewer)
No skipping or invalid records in DFSR debug log (By checking “[ERROR]”)
Replication works fine after cloning, by probing each replicated folder with canary files to check convergence.
No mismatched records after cloning, by checking DFSR debug log and DFSR event log.
Time taken for cloning was measured using Windows PowerShell cmdlet measure-command:

Measure-Command { Export-DfsrClone…}
Measure-Command { Import-DfsrClone…}

Following table and graphs summarize the results one of our testers Jialin Le took on a build that was very close to the RTM build of Windows Server 2012 R2. Given the nature of DFSR clone verification levels, it’s not recommended to use validation level 2 (which involves full file hash and is too time consuming for large dataset like this one!)

Note, the performance for level 0 and level 1 validation is largely dependent on count of files and directories rather than absolute file size, it explains why it takes proportionally more time for 64TB volume to export compared that of 18TB as the former has proportionally more folders.

Image may be NSFW.
Clik here to view.

*Validation Level*	*Volume Size*	*Time used to Export (minutes)*	*Time used to Import(minutes)*
*0 – None*	64 TB	394	2129
	18 TB	111	1229
	10 TB	73	368
	7 TB	70	253
	1 TB	11	17
	Sum(100TB)	659 (0.4 days)	3996 (2.8 days)
*1 – Basic*	64 TB	1043	2701
	18 TB	211	1840
	10 TB	168	577
	7 TB	203	442
	1 TB	17	37
	Sum(100TB)	1642 (1.1 days)	5597 (3.8 days)

From the chart above, you can see getting DFSR ready for replication for large dataset (totaling 100TB) is getting more practical!

I hope you have enjoyed learning more about how we test DFSR features here at Microsoft.

- Tsan Zheng

Image may be NSFW.
Clik here to view.

Using DFS Replication Clone Feature to prepare 100TB of data in 3 days (A Test Perspective)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112