Quantcast
Channel: Storage at Microsoft
Viewing all articles
Browse latest Browse all 268

Data Deduplication in Windows Server Technical Preview 3

$
0
0

With the release of Windows Server Technical Preview 3, it’s time to provide updates on the feature content for Data Deduplication. Rather than only provide the delta from Technical Preview 2 to 3, and make you go back and forth between a couple of blog posts, I’m including the full article intact in this post and just adding the changes for TP3.

For those familiar with the Data Deduplication in Windows Server Technical Preview 2 article, you can jump down to the new entry for “Dedup Improvement #4: Support for the Nano Server”, since that is primarily what is new in this release.

Everything else still applies, so if you have been experimenting with Data Deduplication in the Technical Preview releases, please continue. If you have been waiting to jump in, Technical Preview 3 is a great release to get started with. And of course send email to dedupfeedback@microsoft.com and let us know how your evaluation goes and any questions you may have.

What’s New in Data Deduplication?

If I had to pick two words to sum up the major changes for Data Deduplication coming in the next version of Windows Server, they would be “scale” and “performance”. In this posting, I’ll explain what these changes are and provide some recommendations of what to evaluate in Windows Server Technical Preview 3.

In Windows Server 2016, we are making major investments to enable Data Deduplication (or “dedup” for short) to more effectively scale to handle larger amounts of data. For example, customers have been telling us that they are using dedup for such scenarios as backing up all the tenant VMs for hosting businesses, using from hundreds of terabytes to petabytes of data. For these cases, they want to use larger volumes and files while still getting the great space savings results they are currently getting from Windows Server.

Dedup Improvement #1: Use the volume size you need, up to 64TB

Dedup in Windows Server 2012 R2 optimizes data using a single-threaded job and I/O queue for each volume. It works great, but you do have to be careful not to make the volumes so big that the dedup processing can’t keep up with the rate of data changes, or “churn”. In a previous blog posting (Sizing Volumes for Data Deduplication in Windows Server), we explained in detail how to determine the right volume size for your workload and typically we have recommended to keep volume size <10TB.

That all changes in Windows Server 2016 with a full redesign of dedup optimization processing. We now run multiple threads in parallel using multiple I/O queues on a single volume, resulting in performance that was only possible before by dividing up your data into multiple, smaller volumes:

The result is that our volume guidance changes to a very simple statement: Use the volume size you need, up to 64TB.

Dedup Improvement #2: File sizes up to 1TB are good for dedup

While the current version of Windows Server supports the use of file sizes up to 1TB, files “approaching” this size are noted as “not good candidates” for dedup. The reasons have to do with how the current algorithms scale, where, for example, things like scanning for and inserting changes can slow down as the total data set increases. This has all been redesigned for Windows Server 2016 with the use of new stream map structures and improved partial file optimization, with the results being that you can go ahead and dedup files up to 1TB without worrying about them not being good candidates. These changes also improve overall optimization performance by the way, adding to the “performance” part of the story for Windows Server 2016.

Dedup Improvement #3: Virtualized backup is a new usage type

We announced support for the use of dedup with virtualized backup applications using Windows Server 2012 R2 at TechEd last November, and there has been a lot of customer interest in this scenario since then. We also published a TechNet article with the DPM Team (see Deduplicating DPM Storage) with a reference configuration that lists the specific dedup configuration settings to make the scenario optimal.

With a new release we can do more interesting things to simplify these kinds of deployments and in Windows Server 2016 we have combined all the dedup configuration settings into a new usage type called, as you might expect, “Backup”. This both simplifies the deployment as well as helps to “future proof” your configuration since any future setting changes can be included to be automatically changed by setting this usage type.

Dedup Improvement #4 (new for TP3): Nano Server support

Nano Server is a new installation option in Windows Server Technical Preview that provides a cloud-optimized Windows Server environment. Data deduplication is fully supported in Nano Server. What works differently? Nano Server is a headless deployment option for Windows Server providing a deeply refactored and reduced environment optimized for cloud deployments. Data deduplication has been tuned and validated to operate in the Nano Server environment.

Note that deduplication support in Nano Server is in “preview” status and currently has the following restrictions:

  • Support has only been validated in non-clustered configurations

  • Deduplication job cancellation must be done manually (using the Stop-DedupJob PowerShell command)

Suggestions for What to Check Out in Windows Server TP3

What should you try out in Windows Server TP3? Of course, we encourage you to evaluate overall the new version of dedup on your own workloads and datasets (and this applies to any deployment you may be using or interested in evaluating for dedup, including volumes for general file shares or for supporting a VDI deployment, as described in our previous blog article on Large Scale VDI Deployment).

Also, if you are evaluating Nano Server for TP3, it would be great for you to try out dedup in your environment.

But specifically for the new features, here are a couple of areas we think it would be great for you to try.

Volume Sizes

Try larger volume sizes, up to 64TB. This is especially interesting if you have wanted to use larger volumes in the past but were limited by the requirements for smaller volume sizes to keep up with optimization processing.

Basically the guidance for this evaluation is to only follow the first section of our previous blog article Sizing Volumes for Data Deduplication in Windows Server, “Checking Your Current Configuration”, which describes how to verify that dedup optimization is completing successfully on your volume. Use the volume size that works best for your overall storage configuration and verify that dedup is scaling as expected.

Virtualized Backup

In the TechNet article I mentioned above, Deduplicating DPM Storage, there are two changes you can make to the configuration guidance.

Change #1: Use the new “Backup” usage type to configure dedup

In the section “Plan and set up deduplicated volumes” and in the following section “Plan and set up the Windows File Server cluster”, replace all the dedup configuration commands with the single command to set the new “Backup” usage type.

Specifically, replace all these commands in the article:

# For each volume

Enable-DedupVolume -Volume <volume> -UsageType HyperV

Set-DedupVolume -Volume <volume> -MinimumFileAgeDays 0 -OptimizePartialFiles:$false -Volume <volume>

 

# For each cluster node

Set-ItemProperty -Path HKLM:\Cluster\Dedup -Name DeepGCInterval -Value 0xFFFFFFFF

Set-ItemProperty -Path HKLM:\Cluster\Dedup -Name HashIndexFullKeyReservationPercent -Value 70

Set-ItemProperty -Path HKLM:\Cluster\Dedup -Name EnablePriorityOptimization -Value 1

…with this one new command:

# For each volume

Enable-DedupVolume -Volume <volume> -UsageType Backup

Change #2: Use the volume size you need for the DPM backup data

In the article section “Plan and set up deduplicated volumes”, a volume size of 7.2TB is specified for the volumes containing the deduplicated VHDX files containing the DPM backup data. For evaluating Windows Server TP2, the guidance is to use the volume size you need, up to 64TB. Note that you still need to follow the other configuration guidance, e.g., for configuring Storage Spaces and NTFS. But go ahead and use larger volumes as needed, up to 64TB.

Conclusion

We think that these improvements to Data Deduplication coming in Windows Server 2016 and available for you to try out in Windows Server Technical Preview 3 will give you great results as you scale up your data sizes and deploy dedup with virtualized backup solutions.

And we would love to hear your feedback and results. Please send email to dedupfeedback@microsoft.com and let us know how your evaluation goes and, of course, any questions you may have.

Thanks!


Viewing all articles
Browse latest Browse all 268

Trending Articles