Question

I have a long list of short strings and a long list of text files to search in (actually nested folders of files to search in). I need to know which of the test strings do NOT exist in any of the files.

There are many methods to find strings in files and report where they are (for example, FINDSTR), but I've yet to find a way to only list the strings that can't be found.

Was it helpful?

Solution 2

Thanks to Tripp Kinetics for providing the framework for this answer, but I wanted to be able to use built-in Windows commands, rather than install new software, since I will be distributing this to others on our team. With a little research, here's what I came up with:

SET SEARCH_COUNT=0
SET FOUND_COUNT=0
SET NOT_FOUND_COUNT=0

FOR /F "tokens=1" %%G IN (list_of_strings.txt) DO (
    ECHO | SET /P unusedVar=Looking for %%G... 

    FINDSTR /ISPL /C:%%G "folder_to_search\*.*" >nul 2>&1

    IF ERRORLEVEL 1 (
        ECHO Not found
        SET /A NOT_FOUND_COUNT=NOT_FOUND_COUNT+1
        ECHO %%G >> not_found.txt
    ) ELSE (
        ECHO Found!
        SET /A FOUND_COUNT=FOUND_COUNT+1
        ECHO %%G >> found.txt
    )

    SET /A SEARCH_COUNT=SEARCH_COUNT+1
)

ECHO(
ECHO Search complete.
ECHO(
ECHO Looked for %SEARCH_COUNT% strings
ECHO %FOUND_COUNT% found
ECHO %NOT_FOUND_COUNT% not found

OTHER TIPS

Looks like you're in Windows. It's easy to do this in Unix, but that's not necessarily an impediment.

You need a Bourne-compatible shell (/sh, /ksh, /bash, /zsh, etc.), grep and test. You could either go hunting for the native Windows versions of the preceding, or install a bare minimum Cygwin with those packages. I recommend the latter, since it's simpler to make the pieces work together.

Run this command in sh:

for each in `cat /path/to/list_of_strings.txt` ; do
    grep --silent $each `cat /another/path/to/list_of_files.txt`
    if [ $? -eq 1 ]; then
        echo $each
    fi
done

If you don't feel comfortable having that Cygwin install around afterwards, you can always delete it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top