Question

This refers to one of my previous questions: array_unique vs array_flip - This states that array_flip(array_flip()) is much quicker than array_unique() when dealing with simple strings and integers.

What I would like to know is why array_unique() creates a copy of the array, sorts it then removed the duplicates

The source for both functions is available here.

Thanks in advance!

Was it helpful?

Solution

If you think about it algorithmically, the way to remove duplicates is to go through a list, keep track of items you find, and get rid of things that are already in that "found this" list. One easy way to accomplish this is to sort a list. That way it's obvious where to remove duplicates efficiently. Think about you, let alone a computer; which one of these lists is easier to remove duplicates from?

apple
banana
cantaloupe
apple
durian
apple
banana
cantaloupe

or

apple
apple
apple
banana
banana
cantaloupe
cantaloupe
durian

Edit: After looking into it a bit (and finding this article), it looks like while the two both get the job done, they are not functionally equivalent, or at least they aren't always. To paraphrase a couple of these points:

  1. array_unique() sorts the values, as you noted, so array_flip(array_flip()) wouldn't return the same-ordered array -- but this might be desired.
  2. If the values are objects, then you can't make them keys (right?), i.e. the flip method wouldn't work out of the box on all arrays, whereas the sort method works fine, regardless of the value types.

OTHER TIPS

I think Dan Fego gave a wonderful answered as to why one would sort an array prior to removing duplicates; however, I’d like to examine what array_flip() does. I’ll be using the following array to illustrate:

'a' => 'apple'
'b' => 'banana'
'c' => 'apple'
'd' => 'date'

array_flip() exhanges the keys and values producing

'apple'  => 'a'
'banana' => 'b'
'apple'  => 'c'
'date'   => 'd'

However, keys must be unique. The manual describes how array_flip() handles this:

If a value has several occurrences, the latest key will be used as its values, and all others will be lost.

So we get something like this:

'banana' => 'b'
'apple' => 'c'
'date' => 'd'

So if we use array_flip(array_flip()) we get:

'b' => 'banana'
'c' => 'apple'
'd' => 'date'

As for the motivation behind array_unique(), we can only speculate unless Rasmus Lerdorf or someone currently working on PHP development cares to answer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top