複数のファイルのハッシュ化

https://stackoverflow.com/questions/1841737

12-09-2019
|

質問

問題の仕様:

ディレクトリを指定して、そのディレクトリとその非表示でないサブディレクトリを反復処理したいとします。
に渦巻きハッシュを追加する。ファイルの名前である。
スクリプトが再実行されると、古いハッシュが新しいハッシュに置き換えられます。

<filename>.<extension> ==> <filename>.<a-whirlpool-hash>.<extension>

<filename>.<old-hash>.<extension> ==> <filename>.<new-hash>.<extension>

質問：

a) あなたならどうしますか?

b) 利用可能なすべての方法の中で、あなたの方法が最も適しているのは何ですか?

評決：

ありがとうございます。速度と移植性の点で SeigeX の回答を選択しました。
他の bash バリアントよりも経験的に高速です。
私の Mac OS X マシンでは何も変更せずに動作しました。

解決

修正するために更新されました:
1.名前に「[」または「]」を含むファイル名 (実際には任意の文字。コメントを参照してください)
2.名前にバックスラッシュまたは改行が含まれるファイルをハッシュするときの md5sum の処理
3.モジュール化のための機能化されたハッシュチェックアルゴリズム
4.二重否定を削除するためにハッシュチェックロジックをリファクタリングしました。

#!/bin/bash
if (($# != 1)) || ! [[ -d "$1" ]]; then
    echo "Usage: $0 /path/to/directory"
    exit 1
fi

is_hash() {
 md5=${1##*.} # strip prefix
 [[ "$md5" == *[^[:xdigit:]]* || ${#md5} -lt 32 ]] && echo "$1" || echo "${1%.*}"
}

while IFS= read -r -d $'\0' file; do
    read hash junk < <(md5sum "$file")
    basename="${file##*/}"
    dirname="${file%/*}"
    pre_ext="${basename%.*}"
    ext="${basename:${#pre_ext}}"

    # File already hashed?
    pre_ext=$(is_hash "$pre_ext")
    ext=$(is_hash "$ext")

    mv "$file" "${dirname}/${pre_ext}.${hash}${ext}" 2> /dev/null

done < <(find "$1" -path "*/.*" -prune -o \( -type f -print0 \))

このコードには、これまでの他のエントリに比べて次の利点があります。

Bash バージョン 2.0.2 以降に完全に準拠しています
sed や grep などの他のバイナリへの余分な呼び出しはありません。代わりに組み込みパラメータ拡張を使用します
パイプの代わりに「find」のプロセス置換を使用します。この方法ではサブシェルは作成されません
作業対象のディレクトリを引数として受け取り、その健全性チェックを実行します。
コマンド置換に `` 表記ではなく $() を使用します。後者は非推奨です
スペースを含むファイルでも動作します
改行を含むファイルを扱う
複数の拡張子を持つファイルに対応
拡張子のないファイルでも動作します
隠しディレクトリを走査しない
するない事前にハッシュされたファイルをスキップすると、仕様に従ってハッシュが再計算されます

テストツリー

$ tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f
|       |-- g.5236b1ab46088005ed3554940390c8a7.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.5236b1ab46088005ed3554940390c8a7.ext2
|       `-- j.ext1.ext2
|-- c.ext^Mnewline
|   |-- f
|   `-- g.with[or].ext
`-- f^Jnewline.ext

4 directories, 9 files

結果

$ tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f.d41d8cd98f00b204e9800998ecf8427e
|       |-- g.d41d8cd98f00b204e9800998ecf8427e.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|       `-- j.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|-- c.ext^Mnewline
|   |-- f.d41d8cd98f00b204e9800998ecf8427e
|   `-- g.with[or].d41d8cd98f00b204e9800998ecf8427e.ext
`-- f^Jnewline.d3b07384d113edec49eaa6238ad5ff00.ext

4 directories, 9 files

他のヒント

#!/bin/bash
find -type f -print0 | while read -d $'\0' file
do
    md5sum=`md5sum "${file}" | sed -r 's/ .*//'`
    filename=`echo "${file}" | sed -r 's/\.[^./]*$//'`
    extension="${file:${#filename}}"
    filename=`echo "${filename}" | sed -r 's/\.md5sum-[^.]+//'`
    if [[ "${file}" != "${filename}.md5sum-${md5sum}${extension}" ]]; then
        echo "Handling file: ${file}"
        mv "${file}" "${filename}.md5sum-${md5sum}${extension}"
    fi
done

「a b」のようなスペースを含むファイルでテスト済み
「a.b.c」のような複数の拡張子を含むファイルでテスト済み
スペースやドットを含むディレクトリでテスト済み。
「a.b/c」などのドットを含むディレクトリ内の拡張子を含まないファイルでテスト済み
更新しました:ファイルが変更された場合にハッシュを更新するようになりました。

キーポイント：

の使用 print0 にパイプ接続 while read -d $'\0', 、ファイル名のスペースを正しく処理します。
md5sum はお好みのハッシュ関数に置き換えることができます。sed は、md5sum の出力から最初のスペースとその後のすべてを削除します。
基本ファイル名は、その後にスラッシュが続かない最後のピリオドを検索する正規表現を使用して抽出されます (そのため、ディレクトリ名のピリオドは拡張子の一部としてカウントされません)。
拡張子は、ベースファイル名の長さとして開始インデックスを持つ部分文字列を使用して検索されます。

要件のロジックは、bash の代わりに Python を使用することを正当化するほど複雑です。より読みやすく、拡張可能で、保守しやすいソリューションを提供する必要があります。

#!/usr/bin/env python
import hashlib, os

def ishash(h, size):
    """Whether `h` looks like hash's hex digest."""
    if len(h) == size: 
        try:
            int(h, 16) # whether h is a hex number
            return True
        except ValueError:
            return False

for root, dirs, files in os.walk("."):
    dirs[:] = [d for d in dirs if not d.startswith(".")] # skip hidden dirs
    for path in (os.path.join(root, f) for f in files if not f.startswith(".")):
        suffix = hash_ = "." + hashlib.md5(open(path).read()).hexdigest()
        hashsize = len(hash_) - 1
        # extract old hash from the name; add/replace the hash if needed
        barepath, ext = os.path.splitext(path) # ext may be empty
        if not ishash(ext[1:], hashsize):
            suffix += ext # add original extension
            barepath, oldhash = os.path.splitext(barepath) 
            if not ishash(oldhash[1:], hashsize):
               suffix = oldhash + suffix # preserve 2nd (not a hash) extension
        else: # ext looks like a hash
            oldhash = ext
        if hash_ != oldhash: # replace old hash by new one
           os.rename(path, barepath+suffix)

これがテストディレクトリツリーです。を含む：

名前にドットが含まれるディレクトリ内の拡張子のないファイル
すでにハッシュが含まれているファイル名 (冪等性のテスト)
2 つの拡張子が付いたファイル名
名前内の改行

$ tree a
a
|-- b
|   `-- c.d
|       |-- f
|       |-- f.ext1.ext2
|       `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
|   `-- f
`-- f^Jnewline.ext1

7 directories, 5 files

結果

$ tree a
a
|-- b
|   `-- c.d
|       |-- f.0bee89b07a248e27c83fc3d5951213c1
|       |-- f.ext1.614dd0e977becb4c6f7fa99e64549b12.ext2
|       `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
|   `-- f.0bee89b07a248e27c83fc3d5951213c1
`-- f^Jnewline.b6fe8bb902ca1b80aaa632b776d77f83.ext1

7 directories, 5 files

このソリューションはすべての場合に正しく機能します。

Whirlpool ハッシュは Python の stdlib には含まれていませんが、それをサポートする純粋な Python 拡張機能と C 拡張機能の両方があります。 python-mhash.

インストールするには:

$ sudo apt-get install python-mhash

使用するには:

import mhash

print mhash.MHASH(mhash.MHASH_WHIRLPOOL, "text to hash here").hexdigest()

出力：cbdca4520cc5c131fc3a86109dd23fee2d7ff7be56636d398180178378944a4f41480b938608ae98da7ecbf39a4c79b83a8590c4cb1bace5bc638fc92b3e653

呼び出し中 `whirlpooldeep` Pythonで

from subprocess import PIPE, STDOUT, Popen

def getoutput(cmd):
    return Popen(cmd, stdout=PIPE, stderr=STDOUT).communicate()[0]

hash_ = getoutput(["whirlpooldeep", "-q", path]).rstrip()

git ハッシュに基づいてファイルのセットを追跡する必要がある問題に対処できます。

最初の答えにはあまり満足していませんでした。そこでも述べたように、この問題は Perl を使って解決するのが最善のようだからです。質問の 1 つの編集で、これを実行したい OS X マシンに Perl があるとすでに述べているので、試してみました。

bash ですべてを正しく行うのは困難です。奇妙なファイル名による引用の問題を回避し、大文字のファイル名でも適切に動作します。

ここでは Perl を使用して問題を完全に解決します。コマンドラインにリストされているすべてのファイル/ディレクトリに対して実行されます。


#!/usr/bin/perl -w
# whirlpool-rename.pl
# 2009 Peter Cordes <peter@cordes.ca>.  Share and Enjoy!

use Fcntl;      # for O_BINARY
use File::Find;
use Digest::Whirlpool;

# find callback, called once per directory entry
# $_ is the base name of the file, and we are chdired to that directory.
sub whirlpool_rename {
    print "find: $_\n";
#    my @components = split /\.(?:[[:xdigit:]]{128})?/; # remove .hash while we're at it
    my @components = split /\.(?!\.|$)/, $_, -1; # -1 to not leave out trailing dots

    if (!$components[0] && $_ ne ".") { # hidden file/directory
        $File::Find::prune = 1;
        return;
    }

    # don't follow symlinks or process non-regular-files
    return if (-l $_ || ! -f _);

    my $digest;
    eval {
        sysopen(my $fh, $_, O_RDONLY | O_BINARY) or die "$!";
        $digest = Digest->new( 'Whirlpool' )->addfile($fh);
    };
    if ($@) {  # exception-catching structure from whirlpoolsum, distributed with Digest::Whirlpool.
        warn "whirlpool: couldn't hash $_: $!\n";
        return;
    }

    # strip old hashes from the name.  not done during split only in the interests of readability
    @components = grep { !/^[[:xdigit:]]{128}$/ }  @components;
    if ($#components == 0) {
        push @components, $digest->hexdigest;
    } else {
        my $ext = pop @components;
        push @components, $digest->hexdigest, $ext;
    }

    my $newname = join('.', @components);
    return if $_ eq $newname;
    print "rename  $_ ->  $newname\n";
    if (-e $newname) {
        warn "whirlpool: clobbering $newname\n";
        # maybe unlink $_ and return if $_ is older than $newname?
        # But you'd better check that $newname has the right contents then...
    }
    # This could be link instead of rename, but then you'd have to handle directories, and you can't make hardlinks across filesystems
    rename $_, $newname or warn "whirlpool: couldn't rename $_ -> $newname:  $!\n";
}


#main
$ARGV[0] = "." if !@ARGV;  # default to current directory
find({wanted => \&whirlpool_rename, no_chdir => 0}, @ARGV );

利点:- 実際には whirlpool を使用するため、このプログラムを直接使用できます。(libperl-digest-whirlpool のインストール後)。異なる出力形式を持つ異なるプログラムの代わりに、perl Digest の共通インターフェイスがあるため、必要なダイジェスト関数に簡単に変更できます。

他のすべての要件を実装します。隠しファイル (および隠しディレクトリ下のファイル) を無視します。
エラーやセキュリティ上の問題を発生させることなく、あらゆるファイル名を処理できます。(何人かの人がシェルスクリプトでこれを正しく理解しました)。
各ディレクトリに下位ディレクトリに移動することで、ディレクトリツリーをトラバースするためのベストプラクティスに従います (前の回答と同様に、find -execdir を使用します)。これにより、PATH_MAX に関する問題や、実行中にディレクトリ名が変更される問題が回避されます。
で終わるファイル名の賢い処理。ふー..txt...-> foo..hash.txt...
すでにハッシュを含む古いファイル名を、名前を変更せずに処理し、再度名前を戻します。(「.」文字で囲まれた 128 の 16 進数のシーケンスがすべて削除されます。) すべてが正しい場合、ディスクへの書き込みアクティビティは発生せず、すべてのファイルの読み取りのみが行われます。現在のソリューションでは、既に正しい名前が付けられている場合に mv が 2 回実行され、ディレクトリのメタデータの書き込みが発生します。2 つのプロセスを実行する必要があるため、速度も遅くなります。
効率的。フォーク/実行されるプログラムはありませんが、実際に機能するソリューションのほとんどは、最終的にファイルごとに何かを実行する必要がありました。Digest::Whirlpool はネイティブにコンパイルされた共有ライブラリを使用して実装されているため、純粋な Perl のように遅いわけではありません。これは、すべてのファイルに対してプログラムを実行するよりも高速であるはずです。小さなファイル用。
Perl は UTF-8 文字列をサポートしているため、非 ASCII 文字を含むファイル名は問題になりません。(UTF-8 のマルチバイトシーケンスに、単独で ASCII '.' を意味するバイトが含まれるかどうかはわかりません。それが可能であれば、UTF-8 対応の文字列処理が必要になります。sed は UTF-8 を認識しません。Bash の glob 式では可能性があります)。
簡単に拡張可能。これを実際のプログラムに組み込んで、さらに特殊なケースを処理したい場合は、非常に簡単に行うことができます。例えばファイルの名前を変更したいが、ハッシュ名付きのファイル名がすでに存在する場合にどうするかを決めます。
優れたエラー報告。ただし、ほとんどのシェルスクリプトは、実行するプログラムからのエラーを渡すことでこれを実現します。

find . -type f -print | while read file
do
    hash=`$hashcommand "$file"`
    filename=${file%.*}
    extension=${file##*.}
    mv $file "$filename.$hash.$extension"
done

は、

のように、一つのファイルに結果を保存したい場合があります

find . -type f -exec md5sum {} \; > MD5SUMS

あなたが本当にしたい場合は、ハッシュごとに1つのファイル：

find . -type f | while read f; do g=`md5sum $f` > $f.md5; done

あるいは

find . -type f | while read f; do g=`md5sum $f | awk '{print $1}'`; echo "$g $f"> $f-$g.md5; done

ここでの私の感想はbashで、です。特長：非正規のファイルをスキップします。正しく名前に変な文字（すなわちスペース）を持つファイルを扱います。拡張子のファイル名を扱います。（ファイルは実行の間に変更された場合、それは古いものを置き換えるのではなく、新しいハッシュを追加しますが）すでにハッシュファイルをスキップし、それを繰り返し実行することができます。私は、ハッシュ関数としてMD5の-qを使用してそれを書きました。あなたは、それが唯一のハッシュではなく、ファイル名のようなもの=>ハッシュを出力して、何か他のものでこれを置き換えることができる必要があります。

find -x . -type f -print0 | while IFS="" read -r -d $'\000' file; do
    hash="$(md5 -q "$file")" # replace with your favorite hash function
    [[ "$file" == *."$hash" ]] && continue # skip files that already end in their hash
    dirname="$(dirname "$file")"
    basename="$(basename "$file")"
    base="${basename%.*}"
    [[ "$base" == *."$hash" ]] && continue # skip files that already end in hash + extension
    if [[ "$basename" == "$base" ]]; then
            extension=""
    else
            extension=".${basename##*.}"
    fi
    mv "$file" "$dirname/$base.$hash$extension"
done

、shまたはbashの2つのバージョン。一つは、拡張子を持つファイルに自分自身を制限します...

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f -a -name '*.*' | while read f; do
  # remove the echo to run this for real
  echo mv "$f" "${f%.*}.whirlpool-`hash "$f"`.${f##*.}"
done

テスト...

...
mv ./bash-4.0/signames.h ./bash-4.0/signames.whirlpool-d71b117a822394a5b273ea6c0e3f4dc045b1098326d39864564f1046ab7bd9296d5533894626288265a1f70638ee3ecce1f6a22739b389ff7cb1fa48c76fa166.h
...

そして、これより複雑なバージョンは、など、など、またはスペースと奇数文字なし、または拡張子なしで、すべての通常のファイルを処理します...

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f | while read f; do
  name=${f##*/}
  case "$name" in
    *.*) extension=".${name##*.}" ;;
    *)   extension=   ;;
  esac
  # remove the echo to run this for real
  echo mv "$f" "${f%/*}/${name%.*}.whirlpool-`hash "$f"`$extension"
done

ワールプールは、非常に共通のハッシュではありません。あなたは、おそらくそれを計算するプログラムをインストールする必要があります。例えばDebianの/ Ubuntuの「渦巻き」のパッケージが含まれます。プログラムは、それ自体で一つのファイルのハッシュを出力します。 apt-キャッシュ検索ワールプールは、他のいくつかのパッケージが面白いmd5deep含めて、それをサポートすることを示します。

以前anwsersのいくつかは、彼らにスペースを含むファイル名に失敗します。これが事実であるが、あなたのファイルは、ファイル名に任意の改行を持っていない場合は、あなたが安全に区切り文字として\ nを使用することができます。


oldifs="$IFS"
IFS="
"
for i in $(find -type f); do echo "$i";done
#output
# ./base
# ./base2
# ./normal.ext
# ./trick.e "xt
# ./foo bar.dir ext/trick' (name "- }$foo.ext{}.ext2
IFS="$oldifs"

それが重要な理由はIFSを設定せずにしようとします。

私はIFS =で何かをしようとするつもりでした「」; -print0見つけます| -a配列を読みながら、上分割します「」文字は、私は通常、配列変数を使用することはありません。（それが1つを持っていた場合は、ファイルの拡張子、。）私は最後から2番目の配列のインデックスとしてハッシュを挿入し、最後の要素を押し下げmanページでご覧ください簡単な方法はありません、bashの配列変数は面白そうに見えるときはいつでも、私が知っていますそれは私が代わりにperlでやっている何をする時間です！読み取りを使用するための落とし穴を参照してください。 http://tldp.org/LDP/abs/html/gotchas.html# BADREAD0する

-execのsh -cを見つける：

私は私が好きな別の技術を使用することにしました。あなたはファイル名を解析していないので、それは、最も安全なのです。

これはトリックを行う必要があります：


find -regextype posix-extended -type f -not -regex '.*\.[a-fA-F0-9]{128}.*'  \
-execdir bash -c 'for i in "${@#./}";do 
 hash=$(whirlpool "$i");
 ext=".${i##*.}"; base="${i%.*}";
 [ "$base" = "$i" ] && ext="";
 newname="$base.$hash$ext";
 echo "ext:$ext  $i -> $newname";
 false mv --no-clobber "$i" "$newname";done' \
dummy {} +
# take out the "false" before the mv, and optionally take out the echo.
# false ignores its arguments, so it's there so you can
# run this to see what will happen without actually renaming your files.

-execdirのbash -c「CMD」ダミー{} +コマンドの後の最初の引数はシェルの位置パラメータに$ 0そのループの上の「$ @」の一部ではないとなるのでそこダミー引数を有しています。私は、ディレクトリ名に対処する必要はありませんので、私はEXECDIR代わりの幹部を使用（または実際のファイル名はすべて十分に短い長い名前を持つネストされたdirsのためPATH_MAXを超える可能性。）

-not -regexは、同じファイルを2回適用されることから、これを防ぐことができます。ワールプールは、非常に長いハッシュで、MVはあまりにも長い間、私はそのチェックせずに二回、それを実行する場合、ファイル名を言いますが。（XFSファイルシステム上の。）

ファイルは拡張子なしでbasename.hashを取得します。私は。末尾に追加避けるために特別にチェックしなければならなかった、または拡張機能としてベース名を取得します。 ${@#./}すべてのファイル名の前にプットを見つけること./リードアウトストリップませんので、何もそこにあります「」拡張子のないファイルのための全文字列のます。

MV --no-クロバーは、GNUの拡張であってもよいです。あなたはGNU MVを持っていない場合は、既存のファイルを削除されないようにしたい場合は、何かを行う（例えば、あなたが一度にこれを実行して、同じファイルのいくつかは、彼らの古い名前でディレクトリに追加され、あなたは再びそれを実行する）OTOH、あなたがその行動をしたい場合は、ちょうどそれを取り出します。

私のソリューションは、ファイル名が改行（彼らができる、あなたが知っている！）、または任意の他の可能な文字が含まれている場合でも動作するはずです。これはPerlでの迅速かつ容易になるだろうが、あなたは、シェルを求めています。

（代わりに、元の名前を変更する）すべてのチェックサムを持つ一つのファイルを作成するためのwallenbornのソリューションはかなり良いが、非効率的です。そのコマンドラインにフィットするよう、一度にできるだけ多くのファイルでそれを実行し、一度ファイルあたりのmd5sumを実行しないでください。

ディレクトリ型F -print0見つけます| xargsの-0のmd5sum> dir.md5 GNUが見つけてか、xargsのは（+の代わりに注意してください「;」）で構築されています DIR型F -exec MD5チェックサムを見つける{} +> dir.md5

あなただけ-print見つけ使用している場合| xargsの-d「\ n」は、あなたがそれらの中に引用符付きのファイル名でめちゃくちゃにされるので、注意してください。あなたはいつか上でこのスクリプトを実行する可能性があるどのファイルがわからない場合は、必ずprint0または-execを使用するようにしてください。これはESPです。真のファイル名が信頼できないユーザーによって供給されている場合（つまり、サーバーへの攻撃ベクトルである可能性があります。）

あなたの更新の質問に対応してます：

誰もが私はBASHスクリプトで隠しディレクトリに見て回避することができますどのようにコメントすることができた場合は、
、それははるかに高く評価されるだろう。

あなたが使用して検索して隠しディレクトリを回避することができます。

find -name '.?*' -prune -o \( -type f -print0 \)

-name '.*' -pruneは「」プルーン、と何もせずに停止します。：/

私はまだかかわらず、私のPerlのバージョンをお勧めします。私はあなたはまだしかし、CPANからダイジェスト::ワールプールをインストールする必要があります...それを更新します。

うーん、興味深い問題ですね。

次のことを試してください (mktest 関数はテスト専用です -- bash の TDD!:)

編集：

whirlpool ハッシュのサポートが追加されました。
コードのクリーンアップ
ファイル名の引用を改善する
テスト部分の配列構文を変更しました。これで、ほとんどの korn 風のシェルで動作するはずです。pdksh は :- ベースのパラメータ展開をサポートしていないことに注意してください。を意味する)

また、md5モードでは、渦巻き状のハッシュを持つファイル名では失敗することに注意してください。その逆もあり得る。

#!/usr/bin/env bash

#Tested with:
# GNU bash, version 4.0.28(1)-release (x86_64-pc-linux-gnu)
# ksh (AT&T Research) 93s+ 2008-01-31
# mksh @(#)MIRBSD KSH R39 2009/08/01 Debian 39.1-4
# Does not work with pdksh, dash

DEFAULT_SUM="md5"

#Takes a parameter, as root path
# as well as an optional parameter, the hash function to use (md5 or wp for whirlpool).
main()
{
  case $2 in
    "wp")
      export SUM="wp"
      ;;
    "md5")
      export SUM="md5"
      ;;
    *)
      export SUM=$DEFAULT_SUM
      ;;
  esac

  # For all visible files in all visible subfolders, move the file
  # to a name including the correct hash:
  find $1 -type f -not -regex '.*/\..*' -exec $0 hashmove '{}' \;
}

# Given a file named in $1 with full path, calculate it's hash.
# Output the filname, with the hash inserted before the extention
# (if any) -- or:  replace an existing hash with the new one,
# if a hash already exist.
hashname_md5()
{
  pathname="$1"
  full_hash=`md5sum "$pathname"`
  hash=${full_hash:0:32}
  filename=`basename "$pathname"`
  prefix=${filename%%.*}
  suffix=${filename#$prefix}

  #If the suffix starts with something that looks like an md5sum,
  #remove it:
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{32}//'`

  echo "$prefix.$hash$suffix"
}

# Same as hashname_md5 -- but uses whirlpool hash.
hashname_wp()
{
  pathname="$1"
  hash=`whirlpool "$pathname"`
  filename=`basename "$pathname"`
  prefix=${filename%%.*}
  suffix=${filename#$prefix}

  #If the suffix starts with something that looks like an md5sum,
  #remove it:
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{128}//'`

  echo "$prefix.$hash$suffix"
}


#Given a filepath $1, move/rename it to a name including the filehash.
# Try to replace an existing hash, an not move a file if no update is
# needed.
hashmove()
{
  pathname="$1"
  filename=`basename "$pathname"`
  path="${pathname%%/$filename}"

  case $SUM in
    "wp")
      hashname=`hashname_wp "$pathname"`
      ;;
    "md5")
      hashname=`hashname_md5 "$pathname"`
      ;;
    *)
      echo "Unknown hash requested"
      exit 1
      ;;
  esac

  if [[ "$filename" != "$hashname" ]]
  then
      echo "renaming: $pathname => $path/$hashname"
      mv "$pathname" "$path/$hashname"
  else
    echo "$pathname up to date"
  fi
}

# Create som testdata under /tmp
mktest()
{
  root_dir=$(tempfile)
  rm "$root_dir"
  mkdir "$root_dir"
  i=0
  test_files[$((i++))]='test'
  test_files[$((i++))]='testfile, no extention or spaces'

  test_files[$((i++))]='.hidden'
  test_files[$((i++))]='a hidden file'

  test_files[$((i++))]='test space'
  test_files[$((i++))]='testfile, no extention, spaces in name'

  test_files[$((i++))]='test.txt'
  test_files[$((i++))]='testfile, extention, no spaces in name'

  test_files[$((i++))]='test.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile, With (wrong) md5sum, no spaces in name'

  test_files[$((i++))]='test spaced.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile, With (wrong) md5sum, spaces in name'

  test_files[$((i++))]='test.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b798d369352a63c3d.txt'
  test_files[$((i++))]='testfile, With (wrong) whirlpoolhash, no spaces in name'

  test_files[$((i++))]='test spaced.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b798d369352a63c3d.txt']
  test_files[$((i++))]='testfile, With (wrong) whirlpoolhash, spaces in name'

  test_files[$((i++))]='test space.txt'
  test_files[$((i++))]='testfile, extention, spaces in name'

  test_files[$((i++))]='test   multi-space  .txt'
  test_files[$((i++))]='testfile, extention, multiple consequtive spaces in name'

  test_files[$((i++))]='test space.h'
  test_files[$((i++))]='testfile, short extention, spaces in name'

  test_files[$((i++))]='test space.reallylong'
  test_files[$((i++))]='testfile, long extention, spaces in name'

  test_files[$((i++))]='test space.reallyreallyreallylong.tst'
  test_files[$((i++))]='testfile, long extention, double extention,
                        might look like hash, spaces in name'

  test_files[$((i++))]='utf8test1 - æeiaæå.txt'
  test_files[$((i++))]='testfile, extention, utf8 characters, spaces in name'

  test_files[$((i++))]='utf8test1 - 漢字.txt'
  test_files[$((i++))]='testfile, extention, Japanese utf8 characters, spaces in name'

  for s in . sub1 sub2 sub1/sub3 .hidden_dir
  do

     #note -p not needed as we create dirs top-down
     #fails for "." -- but the hack allows us to use a single loop
     #for creating testdata in all dirs
     mkdir $root_dir/$s
     dir=$root_dir/$s

     i=0
     while [[ $i -lt ${#test_files[*]} ]]
     do
       filename=${test_files[$((i++))]}
       echo ${test_files[$((i++))]} > "$dir/$filename"
     done
   done

   echo "$root_dir"
}

# Run test, given a hash-type as first argument
runtest()
{
  sum=$1

  root_dir=$(mktest)

  echo "created dir: $root_dir"
  echo "Running first test with hashtype $sum:"
  echo
  main $root_dir $sum
  echo
  echo "Running second test:"
  echo
  main $root_dir $sum
  echo "Updating all files:"

  find $root_dir -type f | while read f
  do
    echo "more content" >> "$f"
  done

  echo
  echo "Running final test:"
  echo
  main $root_dir $sum
  #cleanup:
  rm -r $root_dir
}

# Test md5 and whirlpool hashes on generated data.
runtests()
{
  runtest md5
  runtest wp
}

#For in order to be able to call the script recursively, without splitting off
# functions to separate files:
case "$1" in
  'test')
    runtests
  ;;
  'hashname')
    hashname "$2"
  ;;
  'hashmove')
    hashmove "$2"
  ;;
  'run')
    main "$2" "$3"
  ;;
  *)
    echo "Use with: $0 test - or if you just want to try it on a folder:"
    echo "  $0 run path (implies md5)"
    echo "  $0 run md5 path"
    echo "  $0 run wp path"
  ;;
esac

のzshを使用します：

$ ls
a.txt
b.txt
c.txt

マジックます：

$ FILES=**/*(.) 
$ # */ stupid syntax coloring thinks this is a comment
$ for f in $FILES; do hash=`md5sum $f | cut -f1 -d" "`; mv $f "$f:r.$hash.$f:e"; done
$ ls
a.60b725f10c9c85c70d97880dfe8191b3.txt
b.3b5d5c3712955042212316173ccf37be.txt
c.2cd6ee2c70b0bde53fbe6cac3c8b8bb1.txt

ハッピー解体！

編集：mv引数の周りのサブディレクトリと引用符で追加されたファイル

ルビーます：

#!/usr/bin/env ruby
require 'digest/md5'

Dir.glob('**/*') do |f|
  next unless File.file? f
  next if /\.md5sum-[0-9a-f]{32}/ =~ f
  md5sum = Digest::MD5.file f
  newname = "%s/%s.md5sum-%s%s" %
    [File.dirname(f), File.basename(f,'.*'), md5sum, File.extname(f)]
  File.rename f, newname
end

スペースを持っているファイル名、拡張子なし、そしてすでにハッシュされています。

を処理します

隠しファイルとディレクトリを無視する - ことを望むならFile::FNM_DOTMATCHの2番目の引数としてglobを追加します。

。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow

複数のファイルのハッシュ化

問題の仕様:

質問：

a) あなたならどうしますか?

b) 利用可能なすべての方法の中で、あなたの方法が最も適しているのは何ですか?

評決：

テストツリー

結果

結果

呼び出し中 whirlpooldeep Pythonで

呼び出し中 `whirlpooldeep` Pythonで