Question

I have a directory with several thousand text files that I need to process. Some of these files are identical while others are identical except the timestamp varies by a few seconds / milliseconds. I need some way to automate the deletion of identical files and only keep one copy.

I'm thinking of something like:

while there are files in the directory still
{
    get file                    // e.g., file0001

    while (file == file + 1)    // e.g., file0001 == file0002 using 'fc' command
    {
        delete file + 1
    }

    move file to another directory
}

Is something like this even possible in Microsoft Windows Server 2003's DOS?

Was it helpful?

Solution

Of course it is. Everything is possible in batch. :D

This batch doesn't actually delete files. It just echos the result of the comparison. You can delete either one of the files if you find two that are the same.

Save the code as CleanDuplicates.bat and start the program with CleanDuplicates {Folder}

Provided AS IS, without any guarantees! I don't want you knocking on my door because your entire server is messed up. ;-)

The code actually calls itself recursively. This could maybe be done in a different way but hey, it works. It also starts itself again in a new cmd, because that makes cleaning up easier. I tested the script in Windows Vista Business, but it should work on Server 2003 as well. Hey, it even has a help function. ;-) It contains two loops that each return every file, so when you implement the actual deleting, it may report that some files don't exist, because they are deleted in an earlier iteration.

@echo off
rem Check input. //, /// and //// are special parameters. No parameter -> help.
if %1check==//check goto innerloop
if %1check==///check goto compare
if %1check==////check goto shell
if %1check==/hcheck goto help
if %1check==check goto help

rem Start ourselves within a new cmd shell. This will automatically return to
rem the original directory, and clear our helper environment vars.
cmd /c %0 //// %1
echo Exiting
goto end

:shell
rem Save the current folder, jump to target folder and set some helper vars
set FCOrgFolder=%CD%
cd %2
set FCStartPath=%0
if not exist %FCStartPath% set FCStartPath=%FCOrgFolder%\%0

rem Outer loop. Get each file and call ourselves with the first special parameter.
for %%a in (*.*) do call %FCStartPath% // "%2" "%%a"

goto end

:innerloop
rem Get each file again and call ourselves again with the second special parameter.
for %%b in (*.*) do call %FCStartPath% /// %2 %3 "%%b"
goto end

:compare
rem Actual compare and some verbose.
if %3==%4 goto end
echo Comparing
echo * %3
echo * %4

fc %3 %4 >nul

rem Get results
if errorlevel 2 goto notexists
if errorlevel 1 goto different

echo Files are identical
goto end

:different
echo Files differ
goto end

:notexists
echo File does not exist
goto end

:help

echo Compares files within a directory.
echo Usage: %0 {directory}
goto end

:end
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top