Question

I'd like to find human-readable files on my Linux machine without a file extension constraint. Those files should be of human sensing files like text, configuration, HTML, source-code etc. files. Is there a way to filter and locate?

Was it helpful?

Solution 2

find and file are your friends here:

find /dir/to/search -type f -exec sh -c 'file -b {} | grep text &>/dev/null' \; -print

This will find any files (NOTE: it will not find symlinks directories sockets, etc., only regular files) in /dir/to/search and run sh -c 'file -b {} | grep text &>/dev/null' ; which looks at the type of file and looks for text in the description. If this returns true (i.e., text is in the line) then it prints the filename.

NOTE: using the -b flag to file means that the filename is not printed and therefore cannot create any issues with the grep. E.g., without the -b flag the binary file gettext would erroneously be detected as a textfile.

For example,

root@osdevel-pete# find /bin -exec sh -c 'file -b {} |  grep text &>/dev/null' \; -print
/bin/gunzip
/bin/svnshell.sh
/bin/unicode_stop
/bin/unicode_start
/bin/zcat
/bin/redhat_lsb_init
root@osdevel-pete# find /bin -type f -name *text*
/bin/gettext

If you want to look in compressed files use the --uncompress flag to file. For more information and flags to file see man file.

OTHER TIPS

Use:

find /dir/to/search -type f | xargs file | grep text

find will give you a list of files.

xargs file will run the file command on each of the lines from the piped input.

This should work fine, too:

file_info=`file "$file_name"` # First reading the file info string which should have the words "ASCII" or "Unicode" if it's a readable file

if grep -q -i -e "ASCII" -e "Unicode"<<< "$file_info"; then
    echo "file is readable"
fi
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top