Why deletion of elements of hash table using doubly-linked list is O(1)?

https://stackoverflow.com/questions/8105889

27-02-2021
|

Question

On CLRS's textbook "Introduction to Algorithm", there's such paragraph on pg. 258.

We can delete an element in O(1) time if the lists are doubly linked. (Note that CHAINED-HASH-DELETE takes as input an element x and not its key k, so that we don't have to search for x first. If the hash table supports deletion, then its linked list should be doubly linked so that we can delete an item quickly. If the lists were only singly linked, then to delete element x, we would first have to find x in the list so that we could update the next attribute of x's predecessor. With singly linked lists, both deletion and searching would have the same asymptotic running times).

What puzzle me is this big parenthses, I failed to understand its logic. With doubly linked list, one still have to find x in order to delete it, how is this different from singly linked list? Please help me to understand it!

La solution

The problem presented here is : consider you have are looking at a particular element of a hashtable. How costly is it to delete it?

Suppose you have a simple linked list :

v ----> w ----> x ----> y ----> z
                |
            you're here

Now if you remove x, you need to connect w to y to keep your list linked. You need to access w and tell it to point to y (you want to have w ----> y). But you can't access w from x because it's simply linked! Thus you have to go through all your list to find w in O(n) operations, and then tell it to link to y. That's bad.

Then, suppose you're doubly-linked :

v <---> w <---> x <---> y <---> z
                |
            you're here

Cool, you can access w and y from here, so you can connect the two (w <---> y) in O(1) operation!

Autres conseils

It seems to me that the hash table part of this is mostly a red herring. The real question is: "can we delete the current element from a linked list in constant time, and if so how?"

The answer is: it's a little tricky, but in effect yes, we can -- at least usually. We do not (normally) have to traverse the entire linked list to find the previous element. Instead, we can swap the data between the current element and the next element, then delete the next element.

The one exception to this is when/if we need/want to delete the last item in the list. In this case, there is no next element to swap with. If you really have to do that, there's no real way to avoid finding the previous element. There are, however, ways that will generally work to avoid that -- one is to terminate the list with a sentinel instead of a null pointer. In this case, since we never delete the node with the sentinel value, we never have to deal with deleting the last item in the list. That leaves us with relatively simple code, something like this:

template <class key, class data>
struct node {
    key k;
    data d;
    node *next;
};

void delete_node(node *item) {
    node *temp = item->next;
    swap(item->key, temp->key);
    swap(item->data, temp->data);
    item ->next = temp->next;
    delete temp;
}

In general you are correct - the algorithm you posted takes an element itself as input though and not just its key:

Note that CHAINED-HASH-DELETE takes as input an element x and not its key k, so that we don't have to search for x first.

You have the element x - since it is a double linked list you have pointers to predecessor and successor, so you can fix those elements in O(1) - with a single linked list only the successor would be available, so you would have to search for the predecessor in O(n).

suppose you want to delete an element x , by using doubly link list you can easily connect the previous element of x to next element of x. so no need to go through all the list and it will be in O(1).

Find(x) is, in general, O(1) for a chained hash table -- it is immaterial whether or not you use singly linked lists or doubly linked lists. They are identical in performance.

If, after having run Find(x), you decide that you'd like to delete the object returned, you will find that, internally, a hash table might have to search for your object again. It's still usually going to be O(1) and not a big deal, but you find that you delete an awful lot, you can do a little better. Instead of returning a user's element directly, return a pointer to the underlying hash node. You can then take advantage of some internal structures. So if in this case, you chose a doubly linked list as the way to express your chaining, then during the delete process, there is no need to recompute the hash and search the collection again -- you can omit this step. You have enough information to perform a delete right from where you are sitting. Additional care must be taken if the node you are submitting is the head node, so an integer might be used to mark the location of your node in the original array if it is the head of a linked list.

The trade-off is the guaranteed space taken by the extra pointer versus a possible faster delete (and slightly more complicated code). With modern desktops, space is usually very cheap, so this might be a reasonable trade-off.

Coding point of view: one can use unordered_map in c++ to implement this.

unordered_map<value,node*>mp;

Where node* is a pointer to a structure storing the key, left and right pointers!

How to use:

If you have a value v and you want to delete that node just do:

Access that nodes value like mp[v].
Now just make its left pointer point to the node on its right.

And voila, you are done.

(Just to remind, in C++ unordered_map takes an average O(1) to access a specific value stored.)

While going through the textbook, I also got confused on the same topic(whether "x" is a pointer to an element or the element itself) and then eventually landed upon this question. But after going through the above discussion and referring textbook again, I think in the book "x" is implicitly assumed to be a "node" and its possible attributes are "key", "next".

Some lines form the textbook..

1)CHAINED-HASH-INSERT(T,x) insert x at the head of list T[h(x.key)]

2)If the lists were only singly linked, then to delete element x, we would first have to find x in the list T[h(x.key)] so that we could update the next attribute of x’s predecessor.

Hence we can assume that the pointer to the element is given and I think Fezvez has given a good explanation for the asked question.

The textbook is wrong. The first member of a list has no usable "previous" pointer, so additional code is needed to find and unlink the element if it happens to be the first in the chain (typically 30 % of the elements are the head of their chain, if N=M, (when mapping N items into M slots; each slot having a separate chain.))

EDIT:

A better way then using a backlink, is to use a pointer to the link that points to us (typically the ->next link of the previous node in the list)

struct node {
   struct node **pppar;
   struct node *nxt;
   ...
   }

deletion then becomes:

*(p->pppar) = p->nxt;

And a nice feature of this method is that it works equally well for the first node on the chain (whose pppar pointer points to some pointer that is not part of a node.

UPDATE 2011-11-11

Because people fail to see my point, I'll try to illustrate. As an example there is a hashtable table (basically an array of pointers) and a buch of nodes one, two, three one of which has to be deleted.

    struct node *table[123];
    struct node *one, *two,*three;
    /* Initial situation: the chain {one,two,three}
    ** is located at slot#31 of the array */
    table[31] = one, one->next = two , two-next = three, three->next = NULL;
                one->prev = NULL, two->prev = one, three->prev = two;


    /* How to delete element one :*/
    if (one->prev == NULL) {
            table[31] = one->next;
            }
    else    {
            one->prev->next = one->next
            }
    if (one->next) {
            one->next->prev = one->prev;
            }

Now it is obvious that the obove code is O(1), but there is something nasty: it still needs array, and the index 31, so in most cases a node is "self contained", and a pointer to a node is sufficient to delete it from its chain, except when it happens to be the first node in its chain; additional information will then be needed to find table and 31.

Next, consider the equivalent structure with a pointer-to-pointer as a backlink.

    struct node {
            struct node *next;
            struct node **ppp;
            char payload[43];
            };

    struct node *table[123];
    struct node *one, *two,*three;
    /* Initial situation: the chain {one,two,three}
    ** is located at slot#31 of the array */
    table[31] = one, one-next = two , two-next = three, three->next = NULL;
                one->ppp = &table[31], two->ppp = &one->next, three->ppp = &two-next;

    /* How to delete element one */
    *(one->ppp) = one->next;
    if (one->next) one->next->ppp = one->ppp;

Note: no special cases, and no need to know the parent table. (consider the case where there is more than one hashtable, but with the same nodetypes: the delete operation would still need to know from which table the node should be removed).

Often, in the {prev,next} scenario, the special cases are avoided by adding a dummy node at the start of the double linked list; But that needs to be allocated and initialised, too.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow