Why is it important to delete files in-order to remove them faster?

Question 1

It is not important, nor b-tree issue. It is just a coincidence.

First of all, this is very much implementation dependent and very much ext3 specific. That's why I said it's not important (for general use). Otherwise, put the ext3 tag or edit the summary line.

Second of all, ext3 does not use b-tree for the directory entry index. It uses Htree. The Htree is similar to b-tree but different and does not require balancing. Search "htree" in fs/ext3/dir.c.

Because of the htree based index, a) ext3 has a faster lookup compare to ext2, but b) readdir() returns entries in hash value order. The hash value order is random relative to file creation time or physical layout of data. As we all know random access is much slower than sequential access on a rotating media.

A paper on ext3 published for OLS 2005 by Mingming Cao, et al. suggests (emphasis mine):

to sort the directory entries returned by readdir() by inode number.

Now, onto rsync. Rsync sorts files by file name. See flist.c::fsort(), flist.c::file_compare(), and flist.c::f_name_cmp().

I did not test the following hypothesis because I do not have the data sets from which @MIfe got 43 seconds. but I assume that sorted-by-name was much closer to the optimal order compare to the random order returned by readdir(). That was why you saw much faster result with rsync on ext3. What if you generate 1000000 files with random file names then delete them with rsync? Do you see the same result?

Question 2

Let's assume that the answer you posted is correct, and that the given file system does indeed store things in a balanced tree. Balancing a tree is a very expensive operation. Keeping a tree "partially" balanced is pretty simple, in that when you allow for a tree to be imbalanced slightly, you only worry about moving things around the point of insertion/deletion. However, when talking about completely balanced trees, when you remove a given node, you may find that suddenly, the children of this node could belong on the complete opposite side of the tree, or a child node on the opposite side has become the root node, and all of it's children need to be rotated up the tree. This requires you to do either a long series of rotations, or to place all the items into an array and re-create the tree.

            5
    3               7
2       4       6       8

now remove the 7, easy right?

            5
    3               8
2       4       6

Now remove the 6, still easy, yes...?

            5
    3               8
2       4

Now remove the 8, uh oh

            5
    3               
2       4

Getting this tree to be the proper balanced form like:

        4
    3       5
2

Is quite expensive, compared at least to the other removals we have done, and gets exponentially worse as the depth of our tree increases. We could make this go much(exponentially) faster by removing the 2 and the 4, before removing the 8. Particularly if our tree was more than 3 levels deep.

Without sorting removal is on average a O(K * log_I(N)^2). N representing the number of elements total, and K the number to be removed, I the number of children a given node is permitted, log_I(N) then being the depth, and for each level of depth we increase the number of operations quadratically.

Removal with some ordering help is on average O(K * log_I(N)), though sometimes ordering cannot help you and you are stuck removing something that will require a re-balance. Still, minimizing this is optimal.

EDIT:

Another possible tree ordering scheme:

            8
    6               7   
1       2       3       4

Accomplishing optimal removal under this circumstance would be easier, because we can take advantage of our knowledge of how things are sorted. Under either situation it is possible, and in fact both are identical, under this one the logic is just a little simpler to understand, because the ordering is more human friendly for the given scenario. In either case in-order is defined as "remove the farthest leaf first", in this case it just so happens to be that the farthest leaves are also the smallest numbers, a fact that we could take advantage of to make it even a little more optimal, but this fact is not necessarily true for the file system example presented(though it may be).

Question 3

I am not convinced that the number of B-tree rebalancing changes significantly if you delete the files in-order. However I do believe that the number of different seeks to external storage will be significantly smaller if you do this. At any time, the only nodes in the B-tree that need be visited will then be the far right boundary of the tree, whereas, with a random order, each leaf block in the B tree is visited with equal probability for each file.

Question 4

Rebalancing for B-Trees are cheaper than B-Tree+ implementations that's why most filesystems and database indexes implementations use them.

There are many approaches when deletion, depending on the approach it can be more efficient in terms of time and the need to rebalance the tree. You'll also have to consider the size of the node, since the number of keys the node can store will affect the need for rebalancing the tree. A large node size will just reorder keys inside the node, but a small one probably will make the tree rebalance many times.

A great resource for understanding this is the famous CLR (Thomas Cormen) book "Introduction to Algorithms".

Question 5

On storage systems where hosting huge directories, the buffer cache will be under stress and buffers may get recycled. So, if you have deletes spaced apart by time, then the number of disk reads to get the btree back in core into the buffer cache, between deletes, may be high.

If you sort the files to be deleted, effectively you are delaying the deletes and bunching them. This may have the side effect of more deletes per block of btree paged in. If there are stats to say what the buffer cache hits are between the two experiments, it may tell if this hypo is wrong or not.

But, if there is no stress on the buffer cache during the deletes, then the btree blocks could stay in core and then my hypothesis is not a valid one.