Hi folks, Ned here again. We created DFSR Cloning in Windows Server 2012 R2 to make initial synchronization faster. Today I talk about file attributes, how they sneak inefficiency into your cloning process, and what can you do about them.
Let’s get to it.
Attributes, DFSR, and Cloning
Attributes are simply metadata on files and folders that describe special states, like hidden or read-only. They often don’t change the file in a meaningful way. DFSR handles attribute changes with a few methods.
If you change these attributes, DFSR does replicate the attributes to the other server:
- FILE_ATTRIBUTE_HIDDEN
- FILE_ATTRIBUTE_READONLY
- FILE_ATTRIBUTE_SYSTEM
- FILE_ATTRIBUTE_NOT_CONTENT_INDEXED
- FILE_ATTRIBUTE_OFFLINE
- FILE_ATTRIBUTE_REPARSE_POINT (note: it’s not that simple – see this treatise on what actually does and doesn’t work. Files with the IO_REPARSE_TAG_DEDUP, IO_REPARSE_TAG_SIS, or IO_REPARSE_TAG_HSM reparse tags are replicated as normal files)
- FILE_ATTRIBUTE_SPARSE_FILE
- FILE_ATTRIBUTE_DIRECTORY
- FILE_ATTRIBUTE_COMPRESSED
If you change these attributes, DFSR doesnot trigger replication of the attributes to the other server (but if the file is altered in another way that does trigger replication, these attributes come along for the ride):
- FILE_ATTRIBUTE_ARCHIVE
- FILE_ATTRIBUTE_NORMAL
There is the FILE_ATTRIBUTE_TEMPORARY attribute (good call, Dragos!), which makes DFSR ignore the file.
This is normal, day-to-day replication – Bobby Joe in Accounting sets the quarterly report to hidden and read-only, then DFSR updates all the other replicated copies of that file with these attributes. This gets more interesting in DFSR Cloning. The cloning process bypasses the step of always exchanging file information between servers during initial sync, by simply providing the new server with a copy of the old server’s database. This means that when you perform Import-DfsrClone, we need to check the local copy of the preseeded data with whatever is in the imported database. If the file dates, file sizes, or file ACL differ, we know that someone messed with the file between the export and the import, at least on this destination server.
However, we also check the attributes – if they differ between the preseeded files and the database records, we consider the file mismatched. Any mismatched files replicate to the destination using the normal initial sync mechanism, after cloning completes. Unlike usual, this is a full file replication, not just the metadata.
In other words:
- If someone decides to change attributes on the files after they cloned the database, those files replicate again. Even if they are files with the Archive or Normal bit being changed.
- If someone changes attributes on the source copy of the files, and they are in the list above that trigger replication, those files are going to queue into the source server’s backlog, and replicate once you finish cloning. I.e. a little later, and metadata replication only, but you still pay a price.
#2 is out of your control, and frankly, who cares? You created a replication topology, there’s no worry about it actually replicating. #1 is avoidable. What could be changing these files? Here are some possible culprits:
- Archive bit – Usually disappearing, giving the file the Normal bit. The Archive bit is an idiotic legacy that supposedly tells you a file has not been backed up. Windows Server stopped using it many years ago. If this attribute is changing, you are likely running third party backup software on the destination server. Update your software or
yell atkindly ask your vendor why they are still using this junk bit instead of the correct USN journal methodology. - Hidden, read-only, compressed, and/or system bit – This is all you, buddy! Some application, some script, some automation, or - hopefully not – some individual user is changing things on the destination prior to cloning completion. Get out the ProcMon, I have no way to tell you who, when, or how.
- Reparse point and sparse file bit – This is likely Windows Server Deduplication. Not that dedup is at fault; you or your colleagues did the dirty deed. When you run an optimization job in dedup to dehydrate files, they have to be marked for the chunk store. That mark means setting the sparse file attribute and a reparse point, in this case with the IO_REPARSE_TAG_DEDUP tag. When you decided to turn on dedup and alter all the files in the middle of your cloning operations, you altered their attributes – probably a lot of files too, dedup is good at its job. This is one of those “Doctor, it hurts when I do this” scenarios. Don’t do that.
Detecting It
That’s all fine, but how to tell if you are getting attribute-based cloning inefficiency? After all, when you look at your event logs after cloning, you only see a count of mismatches. Those could be anything:
To the debug logs!
First, look for lines that start with “[WARN] DBClone::IDTableImportUpdate Mismatch record was found”. For instance, here is our filewith that warning:
Look closely. The ACL hashes match, the write times match, and the files sizes match. But the attributes?
20150616 15:00:01.037 2980 DBCL 4054 [WARN] DBClone::IDTableImportUpdate Mismatch record was found. Local ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7LastWriteTime:20150605 16:52:48.545FileSizeLow:402432 FileSizeHigh:0Attributes:128Clone ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7LastWriteTime:20150605 16:52:48.545FileSizeLow:402432 FileSizeHigh:0Attributes:32
Aha! To the Internet!
A 128 is FILE_ATTRIBUTE_NORMAL. Someone removed the archive bit. Ok, that’s not too bad. How about:
To the calculator! We know that 32 is a file with the archive bit. 34-32=2. A value of 2 is FILE_ATTRIBUTE_HIDDEN. Starting to make sense?
I wonder why it’s hidden?
==============================================================
Ok, one more:
Got the hang of it? Good. What a strange set of files names…
To the wrap-up! At Microsoft these are often called “learnings”, which makes me want to hurtle across the conference table and shake the person until they admit that learnings isn’t a $%#^#^%& word.
Ahem.
The Lesson
Don’t monkey with file attributes on the destination server. For that matter, don’t changes any files in any fashion on the destination while cloning; this adds inefficiency. Letting users party on the downstream file server isn’t a good idea – you are about to enable replication and DFSR will reconcile the differences. If users on the destination alter files first, their changes will go buh-bye.
What part didn’t you understand?
Not to mention the confusion if you start seeding data onto your destination while they accessed it. “Hey Martha, what’s with this empty share? Oh, now it has some files. And some more. And some more. Ok, I’m going to lunch.”
You don’t have to do anything if you run into attribute changes causing less efficient replication – DFSR will fix everything up by performing initial sync on just those files. Seeing a couple of mismatches is no reason to get in a twist. If a sizable number mismatch, however, you need to evaluate what’s going on and decide if fixing the issue and re-importing is going to save you time in the end.
I want to thank Jeroen de Bonte, Dutch Microsoft Support Engineer extraordinaire, for working with us these behaviors, which pointed out that we had no documentation on it. Good man.
Until next time,
- Ned “Destitute and in Disrepute” Pyle