Pergunta

Given a directory, how do I find all files within it (and any sub-directories) that are not hard-linked files? Or more specifically, that are not hard-linked files with more than one reference?

Basically I want to scan a folder and return a list of unique files within that directory, including directories and symbolic links (not their targets). If possible, it'd be nice to also ignore hard-linked directories on file-systems that support them (such as HFS+).

Foi útil?

Solução 2

Hard-linked filea have the same inode. You can use stat to print the inode and the filename, and use awk to print the file only for the first time that inode appears:

stat -c '%i %n' *csv | awk '!seen[$1]++' | cut -d ' ' -f 2-

Outras dicas

find has an option that should be useful:

find . -type f -links 1 -print

Files that are hard linked by definition have a link count of 2 or greater, so this will show all files that have no other links to them.

As I'm sure you know, all files have at least one hard link (in the parent directory).

To answer the question in your first paragraph (finding files that don't have additional hardlinks), you'll need to distinguish between directories and everything else. Assuming you have GNU Coreutils, you can use:

stat '%h' filename

to determine the number of hard links for a given file name. Otherwise you can parse the output of ls -ld filename -- which should work, but ls output isn't really meant to be machine-readable.

For anything other than a directory, if the number of links is greater than 1, there's a hard link to it somewhere.

A directory, on the other hand, will always have the usual one link from its parent, plus one for its own . entry, plus one for the .. entry of each of its immediate subdirectories. So you'll have to determine how many links it would have in the absence of any additional hard links, and compare that to the number it actually has.

You can avoid doing this if you happen to know that you're on a system that forbids hard links to directories. (I'm not sure whether that restriction is typically imposed by the OS or by each filesystem.)

But that doesn't solve the problem in your second paragraph, creating a list of unique files within a directory. Knowing that the plain file foo has a link count greater than 1 doesn't tell you whether it's unique in the current directory; the other hard links could be in different directories (they merely have to be in the same filesystem).

To do that, you can do something like:

stat -c '%i %n' *

which prints the inode number and name for each file in the current directory. You can then filter out duplicate inode numbers to get unique entries. This is basically what glenn jackman's answer says. Of course * doesn't actually match everything in the current directory; it skips files whose names start with ., and it can cause problems if some files have special characters (like space) in their names. That may not matter to you, but if it does (assuming GNU find):

find . -maxdepth 1 -print0 | xargs -0 stat -c '%i %n'

(That will still cause problems if any file names contain newline characters, which is actually legal.)

So all what you want is whatever file/link/dir/block/pipe/... but with different inode ? Then it's easy, list them with inode, do a numeric sort and finally only print the one with different inode numbers ... and remind find has a lot of options to restrict the output if you want to filter

find /PATH_to_SEARCH -ls | sort -n | awk '!seen[$1]++'

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top