Question

As per the title of this question, for extremely performance critical situations, is storing a file's metadata (e.g. location, size, download on, etc) in a database going to allow for better performance than attempting to get it from the file system itself? Have there been any case studies into this problem?

To provide a bit more detail on a specific situation, the application needs to mirror terabytes of data (hundreds of files) between a remote site on a continual basis and the current program architecture uses Unix commands (i.e. ls) to determine which files needed to be updated. The file themselves are split between Isilon IQ clusters and Sun Thumper clusters which I have been told good throughput but poor metadata performance. As the application will be the only process to have write permissions to the files we aren't concerned with things getting out of sync, but we are concerned with performance as it currently takes between six and ten hours to transfer the data.

No correct solution

Licensed under: CC-BY-SA with attribution
scroll top