문제

I need to create a script to search through just below a million files of text, code, etc. to find matches and then output all hits on a particular string pattern to a CSV file.

So far I made this;

$location = 'C:\Work*'

$arr = "foo", "bar" #Where "foo" and "bar" are string patterns I want to search for (separately)

for($i=0;$i -lt $arr.length; $i++) {
Get-ChildItem $location -recurse | select-string -pattern $($arr[$i]) | select-object Path | Export-Csv "C:\Work\Results\$($arr[$i]).txt"
}

This returns to me a CSV file named "foo.txt" with a list of all files with the word "foo" in it, and a file named "bar.txt" with a list of all files containing the word "bar".

Is there any way anyone can think of to optimize this script to make it work faster? Or ideas on how to make an entirely different, but equivalent script that just works faster?

All input appreciated!

도움이 되었습니까?

해결책

If your files are not huge and can be read into memory then this version should work quite faster (and my quick and dirty local test seems to prove that):

$location = 'C:\ROM'
$arr = "Roman", "Kuzmin"

# remove output files
foreach($test in $arr) {
    Remove-Item ".\$test.txt" -ErrorAction 0 -Confirm
}

Get-ChildItem $location -Recurse | .{process{ if (!$_.PSIsContainer) {
    # read all text once
    $content = [System.IO.File]::ReadAllText($_.FullName)
    # test patterns and output paths once
    foreach($test in $arr) {
        if ($content -match $test) {
            $_.FullName >> ".\$test.txt"
        }
    }
}}}

Notes: 1) mind changed paths and patterns in the example; 2) output files are not CSV but plain text; there is not much reason in CSV if you are interested just in paths - plain text files one path per line will do.

다른 팁

Let's suppose that 1) the files are not too big and you can load it into memory, 2) you really just want the Path of the file, that matches (not the line etc.).

I tried to read the file only once and then iterate through the regexes. There is some gain (it's a faster then the original solution), but the final result will depend on other factors like file sizes, count of files etc.

Also removing 'ignorecase' makes it faster a little bit.

$res = @{}
$arr | % { $res[$_] = @() }

Get-ChildItem $location -recurse | 
  ? { !$_.PsIsContainer } |
  % { $file = $_
      $text = [Io.File]::ReadAllText($file.FullName)
      $arr | 
        % { $regex = $_
            if ([Regex]::IsMatch($text, $regex, 'ignorecase')) {
              $res[$regex] = $file.FullName
            }
        }
  }
$res.GetEnumerator() | % { 
  $_.Value | Export-Csv "d:\temp\so-res$($_.Key).txt"
}
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top