Algorithm to get count of distinct vowels in a word [closed]

Question 1

Write a function to check if something is a vowel.
Create a map from vowels to counts, initialized to zeroes.
Iterate through the word one character at a time. If a character is a vowel (using function in 1), then increment the counter associated with that vowel.
Iterate through the map and if there are at least two nonzero values, return true.

Question 2

The question asked for the "best" algorithm without specifying what "best" means, so we are going to look at two likely possibilities. It is possible to do much better than the currently accepted answer for either of these definitions of "best."

Often people want to know the fastest algorithm to accomplish some task. That's fine, but remember that in a large application generally only a small proportion of the code contributes substantially to performance. You should first write code that is understandable and maintainable; then if it turns out that the application is not fast enough, you profile it and find out which pieces could make the app substantially faster if optimized. You leave the rest alone.

It is pretty difficult to imagine a realistic scenario in which an application has a performance problem that is traced back to the algorithm for counting the number of distinct vowels in a word, so let's first assume that "best" means "most straightforward and maintainable." That algorithm, for most modern languages, is simply to get the set of letters used in the word and take the intersection with the set of vowel letters, as shown here in Scala:

(word.toSet & vowels).size

If you were to write something much more complicated than that in a real software development job, you'd better have a good reason. So OK, let's suspend disbelief for a moment and pretend that we really do somehow have an application with a performance problem that turns out to be due to the way we compute the number of distinct vowels in an English word. Then we want to replace that code with something very fast. We can do much better than the above, and much better than the currently accepted answer.

We weren't given very much info, but we do know that the input is an English word. English only has 26 letters, a number which fits comfortably in a register on a modern CPU, which is typically 64 bits wide. Let's use the bits of the register to represent the presence of particular letters in the word. If we iterate through the letters of the word and set each's corresponding bit in the register, we now have a record of every letter used. AND that with a value that has only the vowel bits set, and you are left with the distinct vowels used. Finally, count the 1 bits; most modern CPUs have a special instruction to do this very quickly (e.g. POPCNT in the x86 SSE4 extension), but not all languages offer a binding to it.

Note that this is still the same algorithm: we are still getting the set of characters used in the word, taking the intersection of that set with the set of vowels, and getting the size of that intersection. The only difference is the implementation: we are now using bits in a word to represent the set.

In Scala the optimized version might look something like this:

def num1Bits(x: Int) = {  // sadly, no JVM binding for POPCNT that I know of
  var i = x
  i = (i & 0x55555555) + ((i>> 1) & 0x55555555)
  i = (i & 0x33333333) + ((i>> 2) & 0x33333333)
  i = (i & 0x0f0f0f0f) + ((i>> 4) & 0x0f0f0f0f)
  i = (i & 0x00ff00ff) + ((i>> 8) & 0x00ff00ff)
      (i & 0x0000ffff) + ((i>>16) & 0x0000ffff)
}
val vowels = "aeiou"
def bit(c: Char) = 1 << (c & 0x1F)
def charBits(s: String) = (0 /: s) { (soFar, c) => soFar | bit(c) }
val vowelBits = charBits("aeiou")
def numDistinctVowelsUsed(s: String) = num1Bits(charBits(s) & vowelBits)
numDistinctVowelsUsed("linguistics")  // evaluates to 2

although it might improve performance somewhat if you rewrite charBits as a while loop. If that still weren't fast enough, you might drop into C or even assembly language.

There is no map to initialize/update, no calling a function to find out if a character is a vowel, etc. Manipulating the bits of a register is very fast.

It often occurs that an optimized version of an algorithm takes much longer to get right, requires a lot more unit tests, is harder to understand and maintain, and is less flexible in the face of changing requirements. Those are all true here. If optimizing a routine solves a real performance problem with your application, then it might be worth doing; if not, then you should leave the more straightforward code alone.

Question 3

Put your vowels in a list & iterate through to see if its present in your word. Since no mention of size of data I am not optimising this. Below is a python implementation of the same. You can break out of the loop after you have seen at least 2 vowels.

>>> vowels = ['a','e','i','o','u']
>>> word = 'apple'
>>> for vowel in vowels:
...     if vowel in word:
...         print vowel
... 
a
e

Question 4

Might be an opportunity to use the "match" feature in Scala.

     def isVowel(ch: Char):Boolean = {
       ch match {
         case 'a' => true
         case 'e' => true
         case 'i' => true
         case 'o' => true
         case 'u' => true
         case _ => false
       }
     }