Pergunta

I need to create hundreds to thousands of temporary hard or symbolic links that will be deleted shortly after creation. For my purposes both types of links will work (i.e. the target is not a directory and it always exists on the same file system)

As I understand it, symbolic links create a small file that contains the path to the original file. Whereas a hardlink creates a reference to the data in the same inode. So maybe if I am going to be creating/deleting thousands of these links is it better to be creating and deleting thousands of tiny files (symlinks) or thousands of these references (hardlinks)? It seems like one taxes the hard drive (maybe fragmentation) while the other might tax the file system itself? Where are inode references stored. Do I risk corrupting the file system by making so many hard links? What about speed?

Thanks for your expertise!

This a work around to be able to use ffmpeg to encode a movie out of an arbitrary subset of images from a directory. Since ffmpeg requires that the files be named properly (e.g. frame%04d.jpg) I realized I can just create hard/sym links to the subset of files and just name the links appropriately. This avoids renaming the original files and having to actually copy the data. It works great but it requires creating and deleting many thousands of links, repeatedly.

Sort of addresses this problem too I believe: convert image sequence using ffmpeg

Foi útil?

Solução

If this activity breaks your file system, then your file system is at fault, not you. File systems are generally pretty reliable, so don't worry about that.

Both options require adding an entry in the directory. The symbolic link requires creating a file as well. When you access the file the hard link jumps directly to the content, while accessing a symlink requires finding the symlink file, reading it, finding the directory with the content, finding where the content is, and then accessing that. Therefore symlinks are more work for the filesystem all around.

But the difference is minute when compared to the work of actually reading the data in the files. Therefore I would not worry about it, and just go with whichever one best gives you the semantics you want.

Outras dicas

Since you are not trying to create hundreds of thousands to the same file, hard links are marginally better performing.

However, symbolic links in /tmp if /tmp is tmpfs is even better performing yet.

Oh, and symlinks are too small to cause fragmentation issues.

Both options require the addition of a file entry in the directory inode, the directory structure may grow by allocating new blocks.

But a symbolic link requires the allocation of an inode and the filesystem has a limit for inodes. Your hundreds of thousands symlinks may hit that limit and you may get the "Not enough space for file" error message even with gigabytes free.

By default, the file system creation tool choose the maximum number of inodes according to the physical partition size. For instance for Linux ext2/3/4, mkfs.ext3 uses a bytes-per-inode ratio you can find in your /etc/mke2fs.conf.

For an existing filesystem, here is a command to get information about inodes:

# dumpe2fs /dev/sda1 | grep -i inode | less

Inode count:              979200
Free inodes:              742304
Inodes per group:         16320
Inode blocks per group:   510
First inode:              11
Inode size:               128
Journal inode:            8
First orphan inode:       441066
Journal backup:           inode blocks

As a conclusion, you should prefer hard links mainly for resource consumption on disk and in memory (VFS structures in caches).

Another advice: do not create too many files in the same directory, 2'000 files is a reasonable limit to avoid performance issues.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top