Capturing output of find . -print0 into a bash array

https://stackoverflow.com/questions/1116992

12-09-2019
|

Question

Using find . -print0 seems to be the only safe way of obtaining a list of files in bash due to the possibility of filenames containing spaces, newlines, quotation marks etc.

However, I'm having a hard time actually making find's output useful within bash or with other command line utilities. The only way I have managed to make use of the output is by piping it to perl, and changing perl's IFS to null:

find . -print0 | perl -e '$/="\0"; @files=<>; print $#files;'

This example prints the number of files found, avoiding the danger of newlines in filenames corrupting the count, as would occur with:

find . | wc -l

As most command line programs do not support null-delimited input, I figure the best thing would be to capture the output of find . -print0 in a bash array, like I have done in the perl snippet above, and then continue with the task, whatever it may be.

How can I do this?

This doesn't work:

find . -print0 | ( IFS=$'\0' ; array=( $( cat ) ) ; echo ${#array[@]} )

A much more general question might be: How can I do useful things with lists of files in bash?

Solution

Shamelessly stolen from Greg's BashFAQ:

unset a i
while IFS= read -r -d $'\0' file; do
    a[i++]="$file"        # or however you want to process each file
done < <(find /tmp -type f -print0)

Note that the redirection construct used here (cmd1 < <(cmd2)) is similar to, but not quite the same as the more usual pipeline (cmd2 | cmd1) -- if the commands are shell builtins (e.g. while), the pipeline version executes them in subshells, and any variables they set (e.g. the array a) are lost when they exit. cmd1 < <(cmd2) only runs cmd2 in a subshell, so the array lives past its construction. Warning: this form of redirection is only available in bash, not even bash in sh-emulation mode; you must start your script with #!/bin/bash.

Also, because the file processing step (in this case, just a[i++]="$file", but you might want to do something fancier directly in the loop) has its input redirected, it cannot use any commands that might read from stdin. To avoid this limitation, I tend to use:

unset a i
while IFS= read -r -u3 -d $'\0' file; do
    a[i++]="$file"        # or however you want to process each file
done 3< <(find /tmp -type f -print0)

...which passes the file list via unit 3, rather than stdin.

OTHER TIPS

Maybe you are looking for xargs:

find . -print0 | xargs -r0 do_something_useful

The option -L 1 could be useful for you too, which makes xargs exec do_something_useful with only 1 file argument.

The main problem is, that the delimiter NUL (\0) is useless here, because it isn't possible to assign IFS a NUL-value. So as good programmers we take care, that the input for our program is something it is able to handle.

First we create a little program, which does this part for us:

#!/bin/bash
printf "%s" "$@" | base64

...and call it base64str (don't forget chmod +x)

Second we can now use a simple and straightforward for-loop:

for i in `find -type f -exec base64str '{}' \;`
do 
  file="`echo -n "$i" | base64 -d`"
  # do something with file
done

So the trick is, that a base64-string has no sign which causes trouble for bash - of course a xxd or something similar can also do the job.

Yet another way of counting files:

find /DIR -type f -print0 | tr -dc '\0' | wc -c

Since Bash 4.4, the builtin mapfile has the -d switch (to specify a delimiter, similar to the -d switch of the read statement), and the delimiter can be the null byte. Hence, a nice answer to the question in the title

Capturing output of find . -print0 into a bash array

is:

mapfile -d '' ary < <(find . -print0)

You can safely do the count with this:

find . -exec echo ';' | wc -l

(It prints a newline for every file/dir found, and then count the newlines printed out...)

I think more elegant solutions exists, but I'll toss this one in. This will also work for filenames with spaces and/or newlines:

i=0;
for f in *; do
  array[$i]="$f"
  ((i++))
done

You can then e.g. list the files one by one (in this case in reverse order):

for ((i = $i - 1; i >= 0; i--)); do
  ls -al "${array[$i]}"
done

This page gives a nice example, and for more see Chapter 26 in the Advanced Bash-Scripting Guide.

Avoid xargs if you can:

man ruby | less -p 777 
IFS=$'\777' 
#array=( $(find ~ -maxdepth 1 -type f -exec printf "%s\777" '{}' \; 2>/dev/null) ) 
array=( $(find ~ -maxdepth 1 -type f -exec printf "%s\777" '{}' + 2>/dev/null) ) 
echo ${#array[@]} 
printf "%s\n" "${array[@]}" | nl 
echo "${array[0]}" 
IFS=$' \t\n'

I am new but I believe that this an answer; hope it helps someone:

STYLE="$HOME/.fluxbox/styles/"

declare -a array1

LISTING=`find $HOME/.fluxbox/styles/ -print0 -maxdepth 1 -type f`


echo $LISTING
array1=( `echo $LISTING`)
TAR_SOURCE=`echo ${array1[@]}`

#tar czvf ~/FluxieStyles.tgz $TAR_SOURCE

This is similar to Stephan202's version, but the files (and directories) are put into an array all at once. The for loop here is just to "do useful things":

files=(*)                        # put files in current directory into an array
i=0
for file in "${files[@]}"
do
    echo "File ${i}: ${file}"    # do something useful 
    let i++
done

To get a count:

echo ${#files[@]}

Old question, but no-one suggested this simple method, so I thought I would. Granted if your filenames have an ETX, this doesn't solve your problem, but I suspect it serves for any real-world scenario. Trying to use null seems to run afoul of default IFS handling rules. Season to your tastes with find options and error handling.

savedFS="$IFS"
IFS=$'\x3'
filenames=(`find wherever -printf %p$'\x3'`)
IFS="$savedFS"

Gordon Davisson's answer is great for bash. However a useful shortcut exist for zsh users:

First, place you string in a variable:

A="$(find /tmp -type f -print0)"

Next, split this variable and store it in an array:

B=( ${(s/^@/)A} )

There is a trick: ^@ is the NUL character. To do it, you have to type Ctrl+V followed by Ctrl+@.

You can check each entry of $B contains right value:

for i in "$B[@]"; echo \"$i\"

Careful readers may notice that call to find command may be avoided in most cases using ** syntax. For example:

B=( /tmp/** )

Bash has never been good at handling filenames (or any text really) because it uses spaces as a list delimiter.

I'd recommend using python with the sh library instead.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow