Question

Given an input string up to 25 characters long consisting of characters A-Z, output its index in the alphabetically sorted list of all possible anagrams of the input string. Input string is not case sensitive. Input characters can be repeated. The app must complete in less than 500ms and take less than 1GB of memory.

At first glance this looks impossible to do without an arbitrary precision math library. The worst case input is 25 unique characters, resulting in 25! possible anagrams. 25! is orders of magnitude larger than 2^64. Because the relationship between the indexes and the strings is not direct and must be calculated, there is no way to simply convert the string to a number.

This comes from an interview challenge I got the other day. I couldn't come up with a solution for them and they insisted that there is indeed a good solution...

Was it helpful?

Solution

Given the letter frequencies for a word, it is easy to count the number of anagrams of the word. It is the factorial of the total number of characters, divided by the factorials of the frequencies, these numbers are also known as the multinomial coefficients.

Using this fact you can get the index of any anagram by counting the number of anagrams for prefixes that precede it alphabetically. For example, take MISSISSIPPI: the letter frequencies are I: 4, M: 1, P: 2, S: 4, making a total of 11!/(4!1!2!4!) = 34650 anagrams.

  • The number of anagrams that start with I is 10!/(3!1!2!4!) = 12600
  • The number of anagrams that start with MII is 8!/(2!0!2!4!) = 420
  • The number of anagrams that start with MIP is 8!/(3!0!1!4!) = 280
  • The number of anagrams that start with MISI is 7!/(2!0!2!3!) = 210
  • The number of anagrams that start with MISP is 7!/(3!0!1!3!) = 140
  • The number of anagrams that start with MISSII is 5!/(1!0!2!2!) = 30
  • The number of anagrams that start with MISSIP is 5!/(2!0!1!2!) = 30
  • and so on...

Sum these numbers up and you get the index. But yes, you will likely need some arbitrary precision number library, because, as you say, in the worst case there are 25! anagrams and the index may go out of bounds for 64-bit integers.

This doesn't feel very elegant though, if there's a better solution I would love to hear about it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top