Question

During a coding interview I was asked the next question:

Write a program which counts the number of words and lines in the input stream.
Assume you have a reader with method nextChar().

At the first glance it looks simple. But then you realize, that you need to handle a lot of states, like:

  • a consecutive word/line delimeters
  • different end-of-word conditions - words separator or line separator or EOF
  • new string which starts with words separators
  • etc

On the interview I came up with a sortof spagetti code with many if-else and flags.

But I think there should be a formal approach for this kind of problems, which allows you to guarantee that you handled all possible cases, and which can give a structured solution.

I think it maybe something either from automata theory, or from compilers theory (I've never made a deep dive to any of these two areas before).

So, if you recognize a certain type of problem in the problem above, or you know which theory covers problems like this, please let me know.

Was it helpful?

Solution

Finite state machines.

This is essentially a small lexing task. The solution will never be pretty but if you write your code in the following way you can be quite confident of correctness:

curState <- NONE
while(c <- getChar)
    switch(curState) {
       case NONE:
           switch(c) {
               // ....
           }
           break;
       // .....
    }
}

You can also use a data structure to store the transition function (given a state and a character, what is the next state?) but for your case just writing the code is probably the best bet.

... don't forget about your text encoding! UTF-16, right? :)

You might want to look into implementations of the Unix wc utility:

These will be more polished and featured than what you will do in an interview, but its nice to see regardless.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top