Anagram Index Calculation [duplicate]

https://stackoverflow.com/questions/18644470

27-06-2022
|

Question

Given an input string up to 25 characters long consisting of characters A-Z, output its index in the alphabetically sorted list of all possible anagrams of the input string. Input string is not case sensitive. Input characters can be repeated. The app must complete in less than 500ms and take less than 1GB of memory.

At first glance this looks impossible to do without an arbitrary precision math library. The worst case input is 25 unique characters, resulting in 25! possible anagrams. 25! is orders of magnitude larger than 2^64. Because the relationship between the indexes and the strings is not direct and must be calculated, there is no way to simply convert the string to a number.

This comes from an interview challenge I got the other day. I couldn't come up with a solution for them and they insisted that there is indeed a good solution...

Solution

Given the letter frequencies for a word, it is easy to count the number of anagrams of the word. It is the factorial of the total number of characters, divided by the factorials of the frequencies, these numbers are also known as the multinomial coefficients.

Using this fact you can get the index of any anagram by counting the number of anagrams for prefixes that precede it alphabetically. For example, take MISSISSIPPI: the letter frequencies are I: 4, M: 1, P: 2, S: 4, making a total of 11!/(4!1!2!4!) = 34650 anagrams.

The number of anagrams that start with I is 10!/(3!1!2!4!) = 12600
The number of anagrams that start with MII is 8!/(2!0!2!4!) = 420
The number of anagrams that start with MIP is 8!/(3!0!1!4!) = 280
The number of anagrams that start with MISI is 7!/(2!0!2!3!) = 210
The number of anagrams that start with MISP is 7!/(3!0!1!3!) = 140
The number of anagrams that start with MISSII is 5!/(1!0!2!2!) = 30
The number of anagrams that start with MISSIP is 5!/(2!0!1!2!) = 30
and so on...

Sum these numbers up and you get the index. But yes, you will likely need some arbitrary precision number library, because, as you say, in the worst case there are 25! anagrams and the index may go out of bounds for 64-bit integers.

This doesn't feel very elegant though, if there's a better solution I would love to hear about it.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow