여러 파일 해싱

https://stackoverflow.com/questions/1841737

12-09-2019
|

문제

문제 사양:

주어진 디렉토리에서 디렉토리와 숨겨지지 않은 하위 디렉토리를 반복하고 싶습니다.
허가없는 파일 이름에 월풀 해시를 추가하십시오.
스크립트를 다시 실행하면 이전 해시가 새 해시로 대체됩니다.

<filename>.<extension> ==> <filename>.<a-whirlpool-hash>.<extension>

<filename>.<old-hash>.<extension> ==> <filename>.<new-hash>.<extension>

질문:

a) 이것을 어떻게 하시겠습니까?

b) 귀하가 이용할 수 있는 모든 방법 중에서 귀하의 방법이 가장 적합한 것은 무엇입니까?

평결:

감사합니다. 저는 속도와 휴대성 측면에서 SeigeX의 답변을 선택했습니다.
다른 bash 변형보다 경험적으로 더 빠릅니다.
내 Mac OS X 컴퓨터에서는 변경 없이 작동했습니다.

해결책

수정을 위해 업데이트 :
1. 이름으로 '['또는 ']'가있는 파일 이름 (실제로 모든 캐릭터. 댓글 참조)
2. MD5SUM 처리시 이름에 BackSlash 또는 Newline이있는 파일을 해싱 할 때
3. 모듈성을위한 기능화 된 해시-체크 아고
4. 이중 음성을 제거하기 위해 해시 체크 로직을 리팩토링합니다

#!/bin/bash
if (($# != 1)) || ! [[ -d "$1" ]]; then
    echo "Usage: $0 /path/to/directory"
    exit 1
fi

is_hash() {
 md5=${1##*.} # strip prefix
 [[ "$md5" == *[^[:xdigit:]]* || ${#md5} -lt 32 ]] && echo "$1" || echo "${1%.*}"
}

while IFS= read -r -d $'\0' file; do
    read hash junk < <(md5sum "$file")
    basename="${file##*/}"
    dirname="${file%/*}"
    pre_ext="${basename%.*}"
    ext="${basename:${#pre_ext}}"

    # File already hashed?
    pre_ext=$(is_hash "$pre_ext")
    ext=$(is_hash "$ext")

    mv "$file" "${dirname}/${pre_ext}.${hash}${ext}" 2> /dev/null

done < <(find "$1" -path "*/.*" -prune -o \( -type f -print0 \))

이 코드는 지금까지 다른 항목에 대한 다음과 같은 이점이 있습니다.

Bash 버전 2.0.2 이상을 완전히 준수합니다.
Sed 또는 Grep과 같은 다른 바이너리에 대한 불필요한 전화는 없습니다. 대신 내장 매개 변수 확장을 사용합니다
파이프 대신 '찾기'에 프로세스 대체를 사용하면 이런 식으로 하위 쉘이 만들어지지 않습니다.
디렉토리를 인수로 사용하도록하고 그것에 대해 정신적으로 확인합니다.
``명령 대체에 대한 표기법 대신 $ ()를 사용합니다. 후자는 더 이상 사용되지 않습니다.
공간이있는 파일로 작동합니다
NewLines와 파일과 함께 작동합니다
여러 확장자가있는 파일과 함께 작동합니다
확장이없는 파일과 함께 작동합니다
숨겨진 디렉토리를 가로 지르지 않습니다
하다 아니다 미리 표시된 파일을 건너 뛰면 사양에 따라 해시를 다시 계산합니다.

테스트 트리

$ tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f
|       |-- g.5236b1ab46088005ed3554940390c8a7.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.5236b1ab46088005ed3554940390c8a7.ext2
|       `-- j.ext1.ext2
|-- c.ext^Mnewline
|   |-- f
|   `-- g.with[or].ext
`-- f^Jnewline.ext

4 directories, 9 files

결과

$ tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f.d41d8cd98f00b204e9800998ecf8427e
|       |-- g.d41d8cd98f00b204e9800998ecf8427e.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|       `-- j.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|-- c.ext^Mnewline
|   |-- f.d41d8cd98f00b204e9800998ecf8427e
|   `-- g.with[or].d41d8cd98f00b204e9800998ecf8427e.ext
`-- f^Jnewline.d3b07384d113edec49eaa6238ad5ff00.ext

4 directories, 9 files

다른 팁

#!/bin/bash
find -type f -print0 | while read -d $'\0' file
do
    md5sum=`md5sum "${file}" | sed -r 's/ .*//'`
    filename=`echo "${file}" | sed -r 's/\.[^./]*$//'`
    extension="${file:${#filename}}"
    filename=`echo "${filename}" | sed -r 's/\.md5sum-[^.]+//'`
    if [[ "${file}" != "${filename}.md5sum-${md5sum}${extension}" ]]; then
        echo "Handling file: ${file}"
        mv "${file}" "${filename}.md5sum-${md5sum}${extension}"
    fi
done

'a b'와 같은 공백이 포함된 파일에서 테스트되었습니다.
'a.b.c'와 같은 여러 확장자를 포함하는 파일에서 테스트되었습니다.
공백 및/또는 점이 포함된 디렉토리로 테스트되었습니다.
'a.b/c'와 같이 점이 포함된 디렉터리 내에 확장자가 없는 파일에서 테스트되었습니다.
업데이트됨:이제 파일이 변경되면 해시를 업데이트합니다.

키 포인트:

사용 print0 파이프로 while read -d $'\0', 파일 이름의 공백을 올바르게 처리합니다.
md5sum은 원하는 해시 함수로 대체할 수 있습니다.sed는 md5sum의 출력에서 첫 번째 공백과 그 뒤의 모든 공백을 제거합니다.
기본 파일 이름은 뒤에 슬래시가 오지 않는 마지막 마침표를 찾는 정규식을 사용하여 추출됩니다(디렉터리 이름의 마침표가 확장자의 일부로 계산되지 않도록 함).
확장자는 기본 파일 이름의 길이로 시작 인덱스가 있는 하위 문자열을 사용하여 찾습니다.

요구 사항의 논리는 Bash 대신 Python의 사용을 정당화하기에 충분히 복잡합니다. 보다 읽기 쉽고 확장 가능하며 유지 관리 가능한 솔루션을 제공해야합니다.

#!/usr/bin/env python
import hashlib, os

def ishash(h, size):
    """Whether `h` looks like hash's hex digest."""
    if len(h) == size: 
        try:
            int(h, 16) # whether h is a hex number
            return True
        except ValueError:
            return False

for root, dirs, files in os.walk("."):
    dirs[:] = [d for d in dirs if not d.startswith(".")] # skip hidden dirs
    for path in (os.path.join(root, f) for f in files if not f.startswith(".")):
        suffix = hash_ = "." + hashlib.md5(open(path).read()).hexdigest()
        hashsize = len(hash_) - 1
        # extract old hash from the name; add/replace the hash if needed
        barepath, ext = os.path.splitext(path) # ext may be empty
        if not ishash(ext[1:], hashsize):
            suffix += ext # add original extension
            barepath, oldhash = os.path.splitext(barepath) 
            if not ishash(oldhash[1:], hashsize):
               suffix = oldhash + suffix # preserve 2nd (not a hash) extension
        else: # ext looks like a hash
            oldhash = ext
        if hash_ != oldhash: # replace old hash by new one
           os.rename(path, barepath+suffix)

테스트 디렉토리 트리가 있습니다. 그것은 포함한다 :

이름에 점이있는 디렉토리 내부의 확장이없는 파일
이미 해시가있는 파일 이름 (Idempotency에서 테스트)
두 개의 확장자가있는 파일 이름
이름의 신생

$ tree a
a
|-- b
|   `-- c.d
|       |-- f
|       |-- f.ext1.ext2
|       `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
|   `-- f
`-- f^Jnewline.ext1

7 directories, 5 files

결과

$ tree a
a
|-- b
|   `-- c.d
|       |-- f.0bee89b07a248e27c83fc3d5951213c1
|       |-- f.ext1.614dd0e977becb4c6f7fa99e64549b12.ext2
|       `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
|   `-- f.0bee89b07a248e27c83fc3d5951213c1
`-- f^Jnewline.b6fe8bb902ca1b80aaa632b776d77f83.ext1

7 directories, 5 files

솔루션은 모든 경우에 올바르게 작동합니다.

Whirlpool Hash는 Python의 stdlib에 있지 않지만 순수한 Python과 C 확장 기능이 모두 지원됩니다. python-mhash.

설치하려면 :

$ sudo apt-get install python-mhash

그것을 사용하려면 :

import mhash

print mhash.MHASH(mhash.MHASH_WHIRLPOOL, "text to hash here").hexdigest()

출력 : CBDCA4520CC5C131FC3A86109DDD23FEE2D7FF7BE5636D398180178378944A4F41480B938608AE98DA7ECCBF39A4C79B83A8590C4CB1BC6382C43O64444CB1BCO

호출 `whirlpooldeep` 파이썬에서

from subprocess import PIPE, STDOUT, Popen

def getoutput(cmd):
    return Popen(cmd, stdout=PIPE, stderr=STDOUT).communicate()[0]

hash_ = getoutput(["whirlpooldeep", "-q", path]).rstrip()

git 해시를 기반으로 파일 세트를 추적 해야하는 문제에 대한 레버리지를 제공 할 수 있습니다.

나는 첫 번째 대답에 별로 만족하지 않았습니다. 왜냐하면 내가 거기에서 말했듯이 이 문제는 Perl을 사용하여 가장 잘 해결되는 것처럼 보이기 때문입니다.당신은 이미 질문 편집에서 이것을 실행하려는 OS X 시스템에 Perl이 있다고 말했기 때문에 시도해 보았습니다.

Bash에서 모든 것을 올바르게 처리하는 것은 어렵습니다.이상한 파일 이름으로 인한 인용 문제를 피하고 코너 케이스 파일 이름으로 잘 작동합니다.

그래서 여기에 귀하의 문제에 대한 완전한 해결책인 Perl이 있습니다.명령줄에 나열된 모든 파일/디렉터리에서 실행됩니다.


#!/usr/bin/perl -w
# whirlpool-rename.pl
# 2009 Peter Cordes <peter@cordes.ca>.  Share and Enjoy!

use Fcntl;      # for O_BINARY
use File::Find;
use Digest::Whirlpool;

# find callback, called once per directory entry
# $_ is the base name of the file, and we are chdired to that directory.
sub whirlpool_rename {
    print "find: $_\n";
#    my @components = split /\.(?:[[:xdigit:]]{128})?/; # remove .hash while we're at it
    my @components = split /\.(?!\.|$)/, $_, -1; # -1 to not leave out trailing dots

    if (!$components[0] && $_ ne ".") { # hidden file/directory
        $File::Find::prune = 1;
        return;
    }

    # don't follow symlinks or process non-regular-files
    return if (-l $_ || ! -f _);

    my $digest;
    eval {
        sysopen(my $fh, $_, O_RDONLY | O_BINARY) or die "$!";
        $digest = Digest->new( 'Whirlpool' )->addfile($fh);
    };
    if ($@) {  # exception-catching structure from whirlpoolsum, distributed with Digest::Whirlpool.
        warn "whirlpool: couldn't hash $_: $!\n";
        return;
    }

    # strip old hashes from the name.  not done during split only in the interests of readability
    @components = grep { !/^[[:xdigit:]]{128}$/ }  @components;
    if ($#components == 0) {
        push @components, $digest->hexdigest;
    } else {
        my $ext = pop @components;
        push @components, $digest->hexdigest, $ext;
    }

    my $newname = join('.', @components);
    return if $_ eq $newname;
    print "rename  $_ ->  $newname\n";
    if (-e $newname) {
        warn "whirlpool: clobbering $newname\n";
        # maybe unlink $_ and return if $_ is older than $newname?
        # But you'd better check that $newname has the right contents then...
    }
    # This could be link instead of rename, but then you'd have to handle directories, and you can't make hardlinks across filesystems
    rename $_, $newname or warn "whirlpool: couldn't rename $_ -> $newname:  $!\n";
}


#main
$ARGV[0] = "." if !@ARGV;  # default to current directory
find({wanted => \&whirlpool_rename, no_chdir => 0}, @ARGV );

장점:- 실제로 월풀을 사용하므로 이 정확한 프로그램을 직접 사용할 수 있습니다.(libperl-digest-whirlpool 설치 후).다양한 출력 형식을 가진 다양한 프로그램 대신 Perl Digest 공통 인터페이스가 있기 때문에 원하는 다이제스트 기능으로 쉽게 변경할 수 있습니다.

다른 모든 요구 사항을 구현합니다.숨겨진 파일(및 숨겨진 디렉터리 아래의 파일)을 무시합니다.
오류나 보안 문제 없이 가능한 모든 파일 이름을 처리할 수 있습니다.(몇몇 사람들은 쉘 스크립트에서 이 권리를 얻었습니다.)
각 디렉토리로 chdiring하여 디렉토리 트리를 탐색하는 모범 사례를 따릅니다(이전 답변과 마찬가지로 find -execdir 사용).이렇게 하면 PATH_MAX 문제 및 실행 중에 디렉터리 이름이 바뀌는 문제를 방지할 수 있습니다.
.로 끝나는 파일 이름을 영리하게 처리합니다.foo..txt...-> foo..hash.txt...
이름을 바꾼 다음 다시 이름을 바꾸지 않고 이미 해시가 포함된 이전 파일 이름을 처리합니다.("." 문자로 둘러싸인 128개의 16진수 시퀀스를 제거합니다.) 모든 것이 올바른 경우에는 디스크 쓰기 활동이 발생하지 않고 모든 파일을 읽기만 합니다.현재 솔루션은 이미 정확한 이름이 지정된 경우에 mv를 두 번 실행하여 디렉터리 메타데이터 쓰기를 발생시킵니다.그리고 실행해야 하는 두 가지 프로세스가 있기 때문에 속도가 느려집니다.
효율적인.포크/실행되는 프로그램은 없지만 실제로 작동하는 대부분의 솔루션은 파일별로 무언가를 sed해야 했습니다.Digest::Whirlpool은 기본적으로 컴파일된 공유 라이브러리로 구현되므로 순수 Perl이 느리지 않습니다.이는 모든 파일에서 프로그램을 실행하는 것보다 빠릅니다.작은 파일의 경우.
Perl은 UTF-8 문자열을 지원하므로 ASCII가 아닌 문자가 포함된 파일 이름은 문제가 되지 않습니다.(UTF-8의 다중 바이트 시퀀스에 자체적으로 ASCII '.'를 의미하는 바이트가 포함될 수 있는지 확실하지 않습니다.가능하다면 UTF-8 인식 문자열 처리가 필요합니다.sed는 UTF-8을 모릅니다.Bash의 glob 표현식은 그럴 수 있습니다.)
쉽게 확장 가능.이것을 실제 프로그램에 넣고 더 많은 특수 케이스를 처리하고 싶을 때 아주 쉽게 할 수 있습니다.예를 들어파일 이름을 바꾸려고 하는데 해시 이름의 파일 이름이 이미 존재하는 경우 어떻게 할지 결정합니다.
좋은 오류보고.그러나 대부분의 쉘 스크립트는 실행하는 프로그램의 오류를 전달하여 이를 수행합니다.

find . -type f -print | while read file
do
    hash=`$hashcommand "$file"`
    filename=${file%.*}
    extension=${file##*.}
    mv $file "$filename.$hash.$extension"
done

결과를 하나의 파일로 저장할 수 있습니다.

find . -type f -exec md5sum {} \; > MD5SUMS

해시 당 하나의 파일을 원한다면 :

find . -type f | while read f; do g=`md5sum $f` > $f.md5; done

또는

find . -type f | while read f; do g=`md5sum $f | awk '{print $1}'`; echo "$g $f"> $f-$g.md5; done

Bash에서 내 테이크는 다음과 같습니다. 특징 : 비 정기 파일을 건너 뜁니다. 이름으로 이상한 문자 (즉, 공백)가있는 파일을 올바르게 처리합니다. Extensionless Filename을 다루고; 이미 고정 된 파일을 건너 뛰기 때문에 반복적으로 실행할 수 있습니다 (파일이 실행 간 파일이 수정되면 기존 해시를 대체하는 대신 새 해시를 추가합니다). 나는 해시 함수로서 md5 -Q를 사용하여 그것을 썼다. filename => hash와 같은 것이 아니라 해시 만 출력하는 한 이것을 다른 것으로 바꿀 수 있어야합니다.

find -x . -type f -print0 | while IFS="" read -r -d $'\000' file; do
    hash="$(md5 -q "$file")" # replace with your favorite hash function
    [[ "$file" == *."$hash" ]] && continue # skip files that already end in their hash
    dirname="$(dirname "$file")"
    basename="$(basename "$file")"
    base="${basename%.*}"
    [[ "$base" == *."$hash" ]] && continue # skip files that already end in hash + extension
    if [[ "$basename" == "$base" ]]; then
            extension=""
    else
            extension=".${basename##*.}"
    fi
    mv "$file" "$dirname/$base.$hash$extension"
done

sh 또는 bash에서 두 버전. 하나는 확장자가있는 파일로 제한됩니다 ...

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f -a -name '*.*' | while read f; do
  # remove the echo to run this for real
  echo mv "$f" "${f%.*}.whirlpool-`hash "$f"`.${f##*.}"
done

테스트 ...

...
mv ./bash-4.0/signames.h ./bash-4.0/signames.whirlpool-d71b117a822394a5b273ea6c0e3f4dc045b1098326d39864564f1046ab7bd9296d5533894626288265a1f70638ee3ecce1f6a22739b389ff7cb1fa48c76fa166.h
...

그리고이 복잡한 버전은 공간과 홀수 문자 등의 유무에 관계없이 확장의 유무에 관계없이 모든 일반 파일을 처리합니다.

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f | while read f; do
  name=${f##*/}
  case "$name" in
    *.*) extension=".${name##*.}" ;;
    *)   extension=   ;;
  esac
  # remove the echo to run this for real
  echo mv "$f" "${f%/*}/${name%.*}.whirlpool-`hash "$f"`$extension"
done

월풀은 매우 일반적인 해시가 아닙니다. 그것을 계산하려면 프로그램을 설치해야 할 것입니다. 예를 들어 Debian/Ubuntu에는 "Whirlpool"패키지가 포함되어 있습니다. 이 프로그램은 하나의 파일 해시를 그 자체로 인쇄합니다. Apt-Cache Search Whirlpool은 흥미로운 MD5deep을 포함하여 다른 패키지를 지원한다는 것을 보여줍니다.

이전의 일부 항문 중 일부는 공백이있는 파일 이름에서 실패합니다. 이 경우 파일에 파일 이름에 신자 라인이 없으면 n을 구분기로 안전하게 사용할 수 있습니다.


oldifs="$IFS"
IFS="
"
for i in $(find -type f); do echo "$i";done
#output
# ./base
# ./base2
# ./normal.ext
# ./trick.e "xt
# ./foo bar.dir ext/trick' (name "- }$foo.ext{}.ext2
IFS="$oldifs"

그것이 중요한 이유를 확인하기 위해 ifs를 설정하지 않고 시도하십시오.

나는 ifs = ""로 무언가를 시도하려고했다. "; 찾기 -print0 | 읽는 동안 -배열, "". " 문자이지만 일반적으로 배열 변수를 사용하지 않습니다. 맨 페이지에서 해시를 두 번째 라이트 어레이 색인으로 삽입하고 마지막 요소 (파일 확장자가있는 경우 파일 확장자)를 아래로 누르는 쉬운 방법은 없습니다. 대신 Perl에서하고있는 일을 할 시간입니다! 읽기를 사용하려면 gotchas를 참조하십시오.http://tldp.org/ldp/abs/html/gotchas.html#badread0

나는 내가 좋아하는 다른 기술을 사용하기로 결정했다 : find -exec sh -c. 파스 파스 이름을 구문 분석하지 않기 때문에 가장 안전합니다.

이것은 트릭을 수행해야합니다.


find -regextype posix-extended -type f -not -regex '.*\.[a-fA-F0-9]{128}.*'  \
-execdir bash -c 'for i in "${@#./}";do 
 hash=$(whirlpool "$i");
 ext=".${i##*.}"; base="${i%.*}";
 [ "$base" = "$i" ] && ext="";
 newname="$base.$hash$ext";
 echo "ext:$ext  $i -> $newname";
 false mv --no-clobber "$i" "$newname";done' \
dummy {} +
# take out the "false" before the mv, and optionally take out the echo.
# false ignores its arguments, so it's there so you can
# run this to see what will happen without actually renaming your files.

-execdir bash -c 'cmd'dummy {} +는 더미 arg가 있습니다. 나는 exec 대신 execdir를 사용하므로 디렉토리 이름을 다룰 필요가 없습니다 (또는 실제 파일 이름이 충분히 짧을 때 긴 이름을 가진 중첩 된 Dirs에 대한 Path_max를 초과 할 가능성.)

-NOT -REGEX는이 파일에 두 번 적용되는 것을 방지합니다. Whirlpool은 매우 긴 해시이지만 MV는 수표없이 두 번 실행하면 파일 이름이 너무 길다고 말합니다. (XFS 파일 시스템에서)

확장이없는 파일은 basename.hash를 얻습니다. 나는 후행을 추가하지 않거나베이스 이름을 확장자로 얻지 않도록 특별히 확인해야했습니다. $ {@#./}는 모든 파일 이름 앞에 놓여있는 선두를 벗어납니다. 확장자가없는 파일의 전체 문자열에서.

MV-- 클로버는 GNU 확장 일 수 있습니다. GNU MV가없는 경우 기존 파일을 삭제하지 않으려는 경우 다른 작업을 수행하십시오 (예 : 한 번 실행하면 동일한 파일 중 일부가 이전 이름으로 디렉토리에 추가됩니다. 다시 실행하십시오.) Otoh, Otoh, 그 행동을 원한다면 그냥 꺼내십시오.

내 솔루션은 파일 이름에 Newline (알 수 있습니다!) 또는 다른 가능한 캐릭터가 포함되어 있어도 작동해야합니다. Perl에서는 더 빠르고 쉬울 것이지만 쉘을 요구했습니다.

모든 체크섬으로 하나의 파일을 만들기위한 Wallenborn의 솔루션 (원본 이름을 바꾸는 대신)은 매우 좋지만 비효율적입니다. 파일 당 MD5SUM을 한 번만 실행하지 말고 명령 줄에 맞는만큼 많은 파일에서 실행하십시오.

dir -type f -print0 |을 찾으십시오 xargs -0 md5sum> dir.md5 또는 gnu find, xargs는 내장되어 있습니다 ( ';'대신 + + + + + exec md5sum {} +> dir.md5 찾기

만 사용하는 경우 -Pind -Print | xargs -d ' n', 당신은 그것들에 인용 자국이있는 파일 이름으로 나사로 나옵니다. 언젠가이 스크립트를 실행할 수있는 파일을 모르는 경우 항상 print0 또는 -exec을 사용해보십시오. 이것은 ESP입니다. 진실하지 않은 사용자가 파일 이름을 제공하는 경우 (즉, 서버의 공격 벡터가 될 수 있습니다.)

업데이트 된 질문에 대한 응답으로 :

Bash 스크립트를 사용하여 숨겨진 디렉토리를 보는 방법에 대해 누구든지 언급 할 수 있다면 감사하겠습니다.

사용하여 찾은 숨겨진 디렉토리를 피할 수 있습니다

find -name '.?*' -prune -o \( -type f -print0 \)

-name '.*' -prune ".", 아무것도하지 않고 멈출 것입니다. :/

그래도 내 Perl 버전을 추천합니다. 그래도 업데이트했습니다 ... 그래도 CPAN에서 Digest :: Whirlpool을 설치해야 할 수도 있습니다.

HM, 흥미로운 문제.

다음을 시도해보십시오 (mktest 기능은 단지 테스트를위한 것입니다 -Bash의 TDD! :)

편집하다:

월풀 해시에 대한 지원이 추가되었습니다.
코드 정리
파일 이름의 더 나은 인용
테스트 부품의 Array-Syntax 변경 사항은 이제 대부분의 Korn과 같은 쉘과 함께 작동해야합니다. PDKSH는 지원하지 않습니다 :-기반 매개 변수 확장 (또는 오히려 다른 것을 의미합니다).

또한 MD5- 모드에서는 월풀과 같은 해시가있는 파일 이름에 실패하고 그 반대도 마찬가지입니다.

#!/usr/bin/env bash

#Tested with:
# GNU bash, version 4.0.28(1)-release (x86_64-pc-linux-gnu)
# ksh (AT&T Research) 93s+ 2008-01-31
# mksh @(#)MIRBSD KSH R39 2009/08/01 Debian 39.1-4
# Does not work with pdksh, dash

DEFAULT_SUM="md5"

#Takes a parameter, as root path
# as well as an optional parameter, the hash function to use (md5 or wp for whirlpool).
main()
{
  case $2 in
    "wp")
      export SUM="wp"
      ;;
    "md5")
      export SUM="md5"
      ;;
    *)
      export SUM=$DEFAULT_SUM
      ;;
  esac

  # For all visible files in all visible subfolders, move the file
  # to a name including the correct hash:
  find $1 -type f -not -regex '.*/\..*' -exec $0 hashmove '{}' \;
}

# Given a file named in $1 with full path, calculate it's hash.
# Output the filname, with the hash inserted before the extention
# (if any) -- or:  replace an existing hash with the new one,
# if a hash already exist.
hashname_md5()
{
  pathname="$1"
  full_hash=`md5sum "$pathname"`
  hash=${full_hash:0:32}
  filename=`basename "$pathname"`
  prefix=${filename%%.*}
  suffix=${filename#$prefix}

  #If the suffix starts with something that looks like an md5sum,
  #remove it:
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{32}//'`

  echo "$prefix.$hash$suffix"
}

# Same as hashname_md5 -- but uses whirlpool hash.
hashname_wp()
{
  pathname="$1"
  hash=`whirlpool "$pathname"`
  filename=`basename "$pathname"`
  prefix=${filename%%.*}
  suffix=${filename#$prefix}

  #If the suffix starts with something that looks like an md5sum,
  #remove it:
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{128}//'`

  echo "$prefix.$hash$suffix"
}


#Given a filepath $1, move/rename it to a name including the filehash.
# Try to replace an existing hash, an not move a file if no update is
# needed.
hashmove()
{
  pathname="$1"
  filename=`basename "$pathname"`
  path="${pathname%%/$filename}"

  case $SUM in
    "wp")
      hashname=`hashname_wp "$pathname"`
      ;;
    "md5")
      hashname=`hashname_md5 "$pathname"`
      ;;
    *)
      echo "Unknown hash requested"
      exit 1
      ;;
  esac

  if [[ "$filename" != "$hashname" ]]
  then
      echo "renaming: $pathname => $path/$hashname"
      mv "$pathname" "$path/$hashname"
  else
    echo "$pathname up to date"
  fi
}

# Create som testdata under /tmp
mktest()
{
  root_dir=$(tempfile)
  rm "$root_dir"
  mkdir "$root_dir"
  i=0
  test_files[$((i++))]='test'
  test_files[$((i++))]='testfile, no extention or spaces'

  test_files[$((i++))]='.hidden'
  test_files[$((i++))]='a hidden file'

  test_files[$((i++))]='test space'
  test_files[$((i++))]='testfile, no extention, spaces in name'

  test_files[$((i++))]='test.txt'
  test_files[$((i++))]='testfile, extention, no spaces in name'

  test_files[$((i++))]='test.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile, With (wrong) md5sum, no spaces in name'

  test_files[$((i++))]='test spaced.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile, With (wrong) md5sum, spaces in name'

  test_files[$((i++))]='test.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b798d369352a63c3d.txt'
  test_files[$((i++))]='testfile, With (wrong) whirlpoolhash, no spaces in name'

  test_files[$((i++))]='test spaced.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b798d369352a63c3d.txt']
  test_files[$((i++))]='testfile, With (wrong) whirlpoolhash, spaces in name'

  test_files[$((i++))]='test space.txt'
  test_files[$((i++))]='testfile, extention, spaces in name'

  test_files[$((i++))]='test   multi-space  .txt'
  test_files[$((i++))]='testfile, extention, multiple consequtive spaces in name'

  test_files[$((i++))]='test space.h'
  test_files[$((i++))]='testfile, short extention, spaces in name'

  test_files[$((i++))]='test space.reallylong'
  test_files[$((i++))]='testfile, long extention, spaces in name'

  test_files[$((i++))]='test space.reallyreallyreallylong.tst'
  test_files[$((i++))]='testfile, long extention, double extention,
                        might look like hash, spaces in name'

  test_files[$((i++))]='utf8test1 - æeiaæå.txt'
  test_files[$((i++))]='testfile, extention, utf8 characters, spaces in name'

  test_files[$((i++))]='utf8test1 - 漢字.txt'
  test_files[$((i++))]='testfile, extention, Japanese utf8 characters, spaces in name'

  for s in . sub1 sub2 sub1/sub3 .hidden_dir
  do

     #note -p not needed as we create dirs top-down
     #fails for "." -- but the hack allows us to use a single loop
     #for creating testdata in all dirs
     mkdir $root_dir/$s
     dir=$root_dir/$s

     i=0
     while [[ $i -lt ${#test_files[*]} ]]
     do
       filename=${test_files[$((i++))]}
       echo ${test_files[$((i++))]} > "$dir/$filename"
     done
   done

   echo "$root_dir"
}

# Run test, given a hash-type as first argument
runtest()
{
  sum=$1

  root_dir=$(mktest)

  echo "created dir: $root_dir"
  echo "Running first test with hashtype $sum:"
  echo
  main $root_dir $sum
  echo
  echo "Running second test:"
  echo
  main $root_dir $sum
  echo "Updating all files:"

  find $root_dir -type f | while read f
  do
    echo "more content" >> "$f"
  done

  echo
  echo "Running final test:"
  echo
  main $root_dir $sum
  #cleanup:
  rm -r $root_dir
}

# Test md5 and whirlpool hashes on generated data.
runtests()
{
  runtest md5
  runtest wp
}

#For in order to be able to call the script recursively, without splitting off
# functions to separate files:
case "$1" in
  'test')
    runtests
  ;;
  'hashname')
    hashname "$2"
  ;;
  'hashmove')
    hashmove "$2"
  ;;
  'run')
    main "$2" "$3"
  ;;
  *)
    echo "Use with: $0 test - or if you just want to try it on a folder:"
    echo "  $0 run path (implies md5)"
    echo "  $0 run md5 path"
    echo "  $0 run wp path"
  ;;
esac

ZSH 사용 :

$ ls
a.txt
b.txt
c.txt

마법 :

$ FILES=**/*(.) 
$ # */ stupid syntax coloring thinks this is a comment
$ for f in $FILES; do hash=`md5sum $f | cut -f1 -d" "`; mv $f "$f:r.$hash.$f:e"; done
$ ls
a.60b725f10c9c85c70d97880dfe8191b3.txt
b.3b5d5c3712955042212316173ccf37be.txt
c.2cd6ee2c70b0bde53fbe6cac3c8b8bb1.txt

행복한 해체!

편집 : 하위 디렉토리 및 따옴표에 파일이 추가되었습니다 mv 논쟁

루비:

#!/usr/bin/env ruby
require 'digest/md5'

Dir.glob('**/*') do |f|
  next unless File.file? f
  next if /\.md5sum-[0-9a-f]{32}/ =~ f
  md5sum = Digest::MD5.file f
  newname = "%s/%s.md5sum-%s%s" %
    [File.dirname(f), File.basename(f,'.*'), md5sum, File.extname(f)]
  File.rename f, newname
end

공백이 있고 확장이없고 이미 해시 된 파일 이름을 처리합니다.

숨겨진 파일과 디렉토리를 무시합니다 - 추가 File::FNM_DOTMATCH 두 번째 논쟁으로 glob 그것이 원한다면.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow

여러 파일 해싱

문제 사양:

질문:

a) 이것을 어떻게 하시겠습니까?

b) 귀하가 이용할 수 있는 모든 방법 중에서 귀하의 방법이 가장 적합한 것은 무엇입니까?

평결:

테스트 트리

결과

결과

호출 whirlpooldeep 파이썬에서

호출 `whirlpooldeep` 파이썬에서