Question

In the upcoming Java7, there is a new API to check if two file object are same file reference.

Are there similar API provided in the .NET framework?

I've search it over MSDN but nothing enlighten me.

I want it simple but I don't want to compare by filename which will cause problems with hard/symbolic links and different style of path. (e.g. \\?\C:\, C:\).

What I going to do is just prevent duplicated file being drag and dropped to my linklist.

Was it helpful?

Solution

As far as I can see (1) (2) (3) (4), the way JDK7 does it, is by calling GetFileInformationByHandle on the files and comparing dwVolumeSerialNumber, nFileIndexHigh and nFileIndexLow.

Per MSDN:

You can compare the VolumeSerialNumber and FileIndex members returned in the BY_HANDLE_FILE_INFORMATION structure to determine if two paths map to the same target; for example, you can compare two file paths and determine if they map to the same directory.

I do not think this function is wrapped by .NET, so you will have to use P/Invoke.

It might or might not work for network files. According to MSDN:

Depending on the underlying network components of the operating system and the type of server connected to, the GetFileInformationByHandle function may fail, return partial information, or full information for the given file.

A quick test shows that it works as expected (same values) with a symbolic link on a Linux system connected using SMB/Samba, but that it cannot detect that a file is the same when accessed using different shares that point to the same file (FileIndex is the same, but VolumeSerialNumber differs).

OTHER TIPS

Edit: Note that @Rasmus Faber mentions the GetFileInformationByHandle function in the Win32 api, and this does what you want, check and upvote his answer for more information.


I think you need an OS function to give you the information you want, otherwise it's going to have some false negatives whatever you do.

For instance, does these refer to the same file?

  • \server\share\path\filename.txt
  • \server\d$\temp\path\filename.txt

I would examine how critical it is for you to not have duplicate files in your list, and then just do some best effort.

Having said that, there is a method in the Path class that can do some of the work: Path.GetFullPath, it will at least expand the path to long names, according to the existing structure. Afterwards you just compare the strings. It won't be foolproof though, and won't handle the two links above in my example.

Answer: There is no foolproof way in which you can compare to string base paths to determine if they point to the same file.

The main reason is that seemingly unrelated paths can point to the exact same file do to file system redirections (junctions, symbolic links, etc ...) . For example

"d:\temp\foo.txt" "c:\othertemp\foo.txt"

These paths can potentially point to the same file. This case clearly eliminates any string comparison function as a basis for determining if two paths point to the same file.

The next level is comparing the OS file information. Open the file for two paths and compare the handle information. In windows this can be done with GetFileInformationByHandle. Lucian Wischik did an excellent post on this subject here.

There is still a problem with this approach though. It only works if the user account performing the check is able to open both files for reading. There are numerous items which can prevent a user from opening one or both files. Including but not limited to ...

  • Lack of sufficient permissions to file
  • Lack of sufficient permissions to a directory in the path of the file
  • File system change which occurs between the opening of the first file and the second such as a network disconnection.

When you start looking at all of these problems you begin to understand why Windows does not provide a method to determine if two paths are the same. It's just not an easy/possible question to answer.

Here is a C# implementation of IsSameFile using GetFileInformationByHandle:

NativeMethods.cs

public static class NativeMethods
{
  [StructLayout(LayoutKind.Explicit)]
  public struct BY_HANDLE_FILE_INFORMATION
  {
    [FieldOffset(0)]
    public uint FileAttributes;

    [FieldOffset(4)]
    public FILETIME CreationTime;

    [FieldOffset(12)]
    public FILETIME LastAccessTime;

    [FieldOffset(20)]
    public FILETIME LastWriteTime;

    [FieldOffset(28)]
    public uint VolumeSerialNumber;

    [FieldOffset(32)]
    public uint FileSizeHigh;

    [FieldOffset(36)]
    public uint FileSizeLow;

    [FieldOffset(40)]
    public uint NumberOfLinks;

    [FieldOffset(44)]
    public uint FileIndexHigh;

    [FieldOffset(48)]
    public uint FileIndexLow;
  }

  [DllImport("kernel32.dll", SetLastError = true)]
  public static extern bool GetFileInformationByHandle(SafeFileHandle hFile, out BY_HANDLE_FILE_INFORMATION lpFileInformation);

  [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
  public static extern SafeFileHandle CreateFile([MarshalAs(UnmanagedType.LPTStr)] string filename,
    [MarshalAs(UnmanagedType.U4)] FileAccess access,
    [MarshalAs(UnmanagedType.U4)] FileShare share,
    IntPtr securityAttributes,
    [MarshalAs(UnmanagedType.U4)] FileMode creationDisposition,
    [MarshalAs(UnmanagedType.U4)] FileAttributes flagsAndAttributes,
    IntPtr templateFile);
}

PathUtility.cs

public static bool IsSameFile(string path1, string path2)
{
  using (SafeFileHandle sfh1 = NativeMethods.CreateFile(path1, FileAccess.Read, FileShare.ReadWrite, 
      IntPtr.Zero, FileMode.Open, 0, IntPtr.Zero))
  {
    if (sfh1.IsInvalid)
      Marshal.ThrowExceptionForHR(Marshal.GetHRForLastWin32Error());

    using (SafeFileHandle sfh2 = NativeMethods.CreateFile(path2, FileAccess.Read, FileShare.ReadWrite,
      IntPtr.Zero, FileMode.Open, 0, IntPtr.Zero))
    {
      if (sfh2.IsInvalid)
        Marshal.ThrowExceptionForHR(Marshal.GetHRForLastWin32Error());

      NativeMethods.BY_HANDLE_FILE_INFORMATION fileInfo1;
      bool result1 = NativeMethods.GetFileInformationByHandle(sfh1, out fileInfo1);
      if (!result1)
        throw new IOException(string.Format("GetFileInformationByHandle has failed on {0}", path1));

      NativeMethods.BY_HANDLE_FILE_INFORMATION fileInfo2;
      bool result2 = NativeMethods.GetFileInformationByHandle(sfh2, out fileInfo2);
      if (!result2)
        throw new IOException(string.Format("GetFileInformationByHandle has failed on {0}", path2));

      return fileInfo1.VolumeSerialNumber == fileInfo2.VolumeSerialNumber
        && fileInfo1.FileIndexHigh == fileInfo2.FileIndexHigh
        && fileInfo1.FileIndexLow == fileInfo2.FileIndexLow;
    }
  }
}

First I thought it is really easy but this doesn't work:

  string fileName1 = @"c:\vobp.log";
  string fileName2 = @"c:\vobp.log".ToUpper();
  FileInfo fileInfo1 = new FileInfo(fileName1);
  FileInfo fileInfo2 = new FileInfo(fileName2);

  if (!fileInfo1.Exists || !fileInfo2.Exists)
  {
    throw new Exception("one of the files does not exist");
  }

  if (fileInfo1.FullName == fileInfo2.FullName)
  {
    MessageBox.Show("equal"); 
  }

Maybe this library helps http://www.codeplex.com/FileDirectoryPath. I haven't used it myself.

edit: See this example on that site:

  //
  // Path comparison
  //
  filePathAbsolute1 = new FilePathAbsolute(@"C:/Dir1\\File.txt");
  filePathAbsolute2 = new FilePathAbsolute(@"C:\DIR1\FILE.TXT");
  Debug.Assert(filePathAbsolute1.Equals(filePathAbsolute2));
  Debug.Assert(filePathAbsolute1 == filePathAbsolute2);

If you need to compare the same filenames over and over again, I would suggest you look into canonalizing those names.

Under a Unix system, there is the realpath() function which canonalizes your path. I think that's generally the best bet if you have a complex path. However, it is likely to fail on volumes mounted via network connections.

However, based on the realpath() approach, if you want to support multiple volume including network volumes, you could write your own function that checks each directory name in a path and if it references a volume then determine whether the volume reference in both paths is the same. This being said, the mount point may be different (i.e. the path on the destination volume may not be the root of that volume) so it is not that easy to solve all the problems along the way, but it is definitively possible (otherwise how would it work in the first place?!)

Once the filenames properly canonalized a simple string comparison gives you the correct answer.

Rasmus answer is probably the fastest way if you don't need to compare the same filenames over and over again.

You could always perform an MD5 encode on both and compare the result. Not exactly efficient, but easier than manually comparing the files yourself.

Here is a post on how to MD5 a string in C#.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top