Hi folks, Ned Pyle here again. Back at AskDS, I used to write frequently about DFSR behavior and troubleshooting. As DFS Replication has matured and documentation grew, these articles dwindled. Recently though, one of the DFSR developers and I managed to find something undocumented:
A DFSR server upgrade where, despite perfect preseeding, files were conflicting during initial sync.
Sound interesting? Love DFSR debug logs? Have insomnia? Read on!
Background
It began with a customer who was in the process of swapping out their existing Windows Server 2008 R2 servers with Windows Server 2012. They needed access to the new data deduplication functionality in order to save disk space; these servers were replicating files written in batches by an application; the files would never shrink or delete, so future disk space was at a premium.
The customer was following the DFSR replacement steps documented in this article. To their surprise, they found that after they reinstalled the operating system (i.e. Part 5, “reinstall or upgrade”), the new servers were writing DFSR file conflict event 4412 for many of the files during initial sync.
Event ID: 4412
Task Category: None
Level: Information
Keywords: Classic
User: N/A
Computer: srv2.contoso.com
Description:
The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.
Additional Information:
Original File Path: E:\rf1\1B\2B\0D\somefile.ned
New Name in Conflict Folder: somefile-{59F6007D-4D62-4ACF-9C42-3E293F94E74E}-v6391976
Replicated Folder Root: E:\rf1
File ID: {59F6007D-4D62-4ACF-9C42-3E293F94E74E}-v6391976
Replicated Folder Name: RF1
Replicated Folder ID: CE7DFF07-29C9-4FD6-BE33-91985C524AC5
Replication Group Name: RG1
Replication Group ID: E5643B3A-5E2D-440D-8C18-348E7FC9E08E
Member ID: EF793A1F-FFCB-459E-9A97-9AA5F265B8FC
Partner Member ID: 578628CB-11B6-4CC1-932A-788B37CFF026
This was theoretically impossible, because their special application:
- Only wrote to a single server, not all replication nodes
- Never modified or overwrote existing files
Since this a new OS and the new dedup feature was in the mix, the initial concern was that scheduled dehydrations were somehow altering the files that DFSR had not yet completed examining for initial replication. Perhaps the files appeared different between servers, and DFSR was deciding to force existing files to lose conflicts. Even more interestingly though, when we examined the files using DFSRDIAG FILEHASH, the file hashes were identical:
- File Path: E:\rf1\1B\2B\0D\somefile.ned
- Windows Server 2008 R2 file hash: 6691A27E-030CEFC2-5234258D-3D812539
- Windows Server 2012 file hash: 6691A27E-030CEFC2-5234258D-3D812539
- After dedup optimization file hash: 6691A27E-030CEFC2-5234258D-3D812539
- After the conflict file hash: 6691A27E-030CEFC2-5234258D-3D812539
The only difference was the file attribute from the dedup reparse points as we would expect, and we knew Windows Server 2012 DFSR fully supports dedup and does not consider them differing files. The local conflicts were happening, in effect, cosmetically. It was pointless, and slowing initial sync slightly, but at least no data was being lost.
So why on Earth were we seeing this behavior?
Digging Deeper
We enabled DFSR debug logging’s most verbose mode and the customer performed a server replacement – we then waited to see our first conflict. What follows is a (greatly modified for readability) log analysis:
The sample downloaded file: somefile.ned:
DFSR is replicating in a file with the exact same name and path as an existing file on the downstream DFSR server:
20130115 19:17:07.342 5796 MEET 1332 Meet::Install Retries:0 updateName:somefile.neduid:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099934gvsn:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099934 connId:{752068BE-5AA9-4CD0-9EA4-C7220BDE47F4} csName:Rf1 updateType:remote
DFSR decides to download it using RDC cross-file similarity:
20130115 19:17:08.405 5796 RDCX 757 Rdc::SeedFile::Initialize RDC signatureLevels:1, uid:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099934 gvsn:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099934 fileName:somefile.ned fileSize(approx):557056 csId:{9CC90AD2-A99E-4084-8D32-16B1242BF45E} enableSim=1
It found similar files because the previous similarity info from the old Windows Server 2008 R2 replication still exists on the volume and DFSR was re-using it (more on this later):
20130115 19:17:08.498 5796 RDCX 1308 Rdc::SeedFile::UseSimilar similarrelated (SimMatches=8)uid:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099934 gvsn:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099934 fileName:somefile.nedcsId:{9CC90AD2-A99E-4084-8D32-16B1242BF45E} (related: uid:{5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9524579 gvsn:{5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9524579 fileName:somefile.ned csId:{9CC90AD2-A99E-4084-8D32-16B1242BF45E})
DFSR decides that it’s going to use the file and checks to see if it is already staged (it’s not):
20130115 19:17:08.545 5796 STAG 4222 Staging::GetStageReaderOrWriter
+ fid 0x1000000800CFC
+ usn 0x27d2613f0
+ uidVisible 0
..
..
+ gvsn {5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9524579
+ uid {5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9524579
+ parent {5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9520714
..
+ hash 00000000-00000000-00000000-00000000
+ similarity 00000000-00000000-00000000-00000000
+ name somefile.ned
+ Failed to get stage reader as the file is not staged
DFSR then stages the file and updates the hash and similarity information:
20130115 19:17:08.592 5796 CSMG 3585 ContentSetManager::UpdateHash LDB Updating ID Record:
+ fid 0x1000000800CFC
+ usn 0x27d2613f0
+ uidVisible 1
+ filtered 0
..
+ gvsn {5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9524579
+ uid {5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9524579
+ parent {5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9520714
..
+ hash 1CC352AE-916F21F8-1F4E69E4-51A835CA
+ similarity 06032621-083C3D3A-212D182C-0C0A233C
+ name somefile.ned
By doing this, DFSR also updates uidVisible, which is an indication that the file can replicate out (i.e. visible to other replicas). This makes sense because the file is in the similarity table and it therefore must have been staged in the past before, to be replicated out.
Now comes the turn to replicate in the “new” file that we are interested in, which is the same file with the same name, but of course a different UID (since when a server performs initial sync, it creates local UIDs for all the existing files). Its ID record has the uidVisible set to 1 and that leads to UidInheritEnabled returning FALSE:
20130115 19:17:16.748 5796 MEET 3369 Meet::UidInheritEnabled UidInheritEnabled:0 updateName:somefile.ned uid:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099940 gvsn:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099942 connId:{752068BE-5AA9-4CD0-9EA4-C7220BDE47F4} csName:Rf1
This means that we can’t inherit the UID - and therefore cannot simply update the database and move on - because the file has “been replicated out” from DFSR perspective and must therefore be a unique file. Even though it really hasn’t – DFSR just assumes so, because how else would the similarity table already know about it? When DFSR goes through the download process, it finds out that we have same file with different UIDs on a file that has UID visible already:
20130115 19:17:16.748 5796 MEET 6330 Meet::LocalDominates update:
+ present 1
..
+ gvsn {30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099942
+ uid {30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099940
+ parent {65BDCD7F-9F8A-4FFD-B9C0-744D0405AFE5}-v7450758
..
+ hash 1CC352AE-916F21F8-1F4E69E4-51A835CA
+ similarity 06032621-083C3D3A-212D182C-0C0A233C
+ name somefile.ned
+ related.record:
+ fid 0x1000000800CFC
+ usn 0x27d2613f0
+ uidVisible 1
+ filtered 0
..
+ gvsn {5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9524579
+ uid {5D37EFB0-1472-4AA2-B697-1942BB7DE29C}-v9524579
+ parent {65BDCD7F-9F8A-4FFD-B9C0-744D0405AFE5}-v7450758
..
+ csId {9CC90AD2-A99E-4084-8D32-16B1242BF45E}
+ hash 1CC352AE-916F21F8-1F4E69E4-51A835CA
+ similarity 06032621-083C3D3A-212D182C-0C0A233C
+ name somefile.ned
Because of the different UIDs and the fact that the local one has UID visible already, DFSR generates the conflict:
20130115 19:17:16.748 5796 MEET 2989 Meet::InstallRename Moving out name conflicting file updateName:somefile.neduid:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099940 gvsn:{30CCEB24-696C-4315-A8E0-8C70EE025A44}-v8099942 connId:{752068BE-5AA9-4CD0-9EA4-C7220BDE47F4} csName:Rf1
But since the files are truly the same, the conflict doesn’t really matter. DFSR is just making a pointless conflict that writes an event, but which an end-user would never worry about because nothing is different in the winning file.
Why did we already have similarity?
This boils down to a by-design DFSR behavior: if it finds any old similarity files, it uses them. Those special sparse files live under the <volume>\system volume information\dfsr and are called:
- SimilarityTable_1
- SimilarityTable_2
- FileIDTable_1
- FileIDTable_2
The FileIdTable files act in conjunction with the SimilarityTable files, and contain the file info that matches with the similarity table’s signature data; that way cross-file can traverse the similarity table for matching signatures and then look up the matching file ID records.
This customer was doing the right thing and following our steps to remove the previous data, just as the blog posts state. However, since these were hidden files and the root DFSR folder was not deleted, they were skipped, leaving the old similarity table behind. Just a simple oversight (I have since reviewed the DFSR hardware migration article and downloads to make sure this is 100% clear in the steps).
The Sum Up
Like many issues with complex distributed computing systems like DFSR, the law of unintended consequences rules. When Windows Server 2003 R2 DFSR was first designed more than ten years ago, no one was thinking hard about DFSR pre-seeding or upgrading, of course.
Always make sure that you thoroughly delete previous DFSR configuration files when following the DFSR hardware and OS replacement steps, and everything will be swell.
Until next time,
- Ned Pyle