Question

I'm having what I guess is a logic problem. I'm coding in C# but a general pseudo-code solution is welcome.

I have this text files that contain, for example, this text:

blah "hello john"
blah 'the code is "flower" '
blah "good night"

I would like to loop through the double quotes and do something to them, but I want to ignore the double quotes contained within single quotes. I'm getting the position of the opening double quote and the ending double quote as such (with string data containing the contents of the text file):

C#

// Start searching from beginning
int quotestart = 0, quoteend = 0;

while (data.IndexOf('"', quotestart) != -1)
{
  // Get opening double quote
  quotestart = data.IndexOf('"', quotestart);
  // Get ending double quote
  quoteend = data.IndexOf('"', quotestart + 1);

  string sub = data.Substring(quotestart + 1, quoteend - quotestart - 1);
  Console.WriteLine(sub);

  // Set the start position for the next round
  quotestart = quoteend + 1;
}

With my code, the output would be:

hello john
flower
good night

Because "flower" is within single quotes, I would like my output to be:

hello john
good night

EDIT

I'm currently working on an approach where I start by filling all the data between single quotes with, for example 'A'. This way any data between single quotes is ignored when I iterate through the double quotes. Not sure if it's the right approach.

Was it helpful?

Solution

I tried googling finite state machines but having no formal training in computer engineering I must admit I was a bit lost. Do you have any additional pointers?

A FSM is one of the simplest forms of computers. The idea is that you have a finite amount of "state" information and a steady stream of inputs. Each input causes the state to change in a predictable way, based only on the current state and the current input, and causes a predictable output to occur.

So imagine that your inputs are single characters and your output is a single character or a "null" character. Here's a FSM that does what you want:

  • states are OUTSIDE, INSIDEDOUBLE and INSIDESINGLE.
  • inputs are ", ' and x. (WOLOG let x stand for any other character.)

We have three states and three inputs so there are nine possible combinations.

  • if we are OUTSIDE and get x, stay OUTSIDE and emit null.
  • if we are OUTSIDE and get ", go INSIDEDOUBLE and emit null.
  • if we are OUTSIDE and get ', go INSIDESINGLE and emit null.
  • if we are INSIDEDOUBLE and get x, stay INSIDEDOUBLE and emit x
  • if we are INSIDEDOUBLE and get ", go OUTSIDE and emit null
  • if we are INSIDEDOUBLE and get ', stay INSIDEDOUBLE and emit '
  • if we are INSIDESINGLE and get x, stay INSIDESINGLE and emit null
  • if we are INSIDESINGLE and get ", stay INSIDESINGLE and emit null
  • if we are INSIDESINGLE and get ', go OUTSIDE and emit null

The only thing left is to say that the start state is OUTSIDE.

So let's suppose the inputs are 1 " 2 " 3 ' 4 " 5 " ' 6. The states and outputs are:

  • OUTSIDE gets 1, emits null, stays OUTSIDE.
  • OUTSIDE gets ", emits null, goes INSIDEDOUBLE.
  • INSIDEDOUBLE gets 2, emits 2, stays INSIDEDOUBLE
  • INSIDEDOUBLE gets ", emits null, goes OUTSIDE.
  • OUTSIDE gets 3, emits null, stays OUTSIDE.
  • OUTSIDE gets ', emits null, goes INSIDESINGLE

... fill in the rest yourself.

Is that enough of a sketch for you to write the code?

OTHER TIPS

Nice solution; using a switch statement is a traditional way to do this for small FSMs but it becomes unwieldy when the number of states and inputs gets large and complicated. Here is an alternate solution that is easier to scale: a table-driven solution. That is, put the facts about the transitions and actions into an array, and then the FSM is nothing more than a series of array lookups:

// States
const int Outside = 0;
const int InDouble = 1;
const int InSingle = 2;

// Inputs
const int Other = 0;
const int DoubleQuote = 1;
const int SingleQuote = 2;

static readonly int[,] stateTransitions =
{   /*               Other     DoubleQ   SingleQ */
    /* Outside */  { Outside,  InDouble, InSingle },
    /* InDouble */ { InDouble, Outside,  InDouble },
    /* InSingle */ { InSingle, InSingle, Outside }
};

// Do we emit the character or ignore it?
static readonly bool[,] actions =
{   /*              Other   DoubleQ SingleQ */
    /* Outside */ { false,  false,  false },
    /* InDouble */{ true,   false,  true  },
    /* InSingle */{ false,  false,  false }
};

static int Classify(char c)
{
    switch (c)
    {
        case '\'': return SingleQuote;
        case '\"': return DoubleQuote;
        default: return Other;
    }
}

static IEnumerable<char> FSM(IEnumerable<char> inputs)
{
    int state = Outside;
    foreach (char input in inputs)
    {
        int kind = Classify(input);
        if (actions[state, kind]) 
            yield return input;
        state = stateTransitions[state, kind];
    }
}

And now we can get the result with

string.Join("", FSM(@"1""2'3""4""5'6""7'8""9""A'B"))

Many thanks to Eric Lippert for providing the logic behind this solution. I'm providing my C# solution below in case anyone needs it. It has some unnecessary re-assignments that I left just for clarity.

string state = "outside";

for (int i = 0; i < data.Length; i++)
{
    c = data[i];
    switch (state)
    {
        case "outside":
            switch (c)
            {
                case '\'':
                    state = "insidesingle";
                    break;
                case '"':
                    state = "insidedouble";
                    break;
                default:
                    state = "outside";
                    break;
            }
            break;

        case "insidedouble":
            switch (c)
            {
                case '\'':
                    state = "insidedouble";
                    Console.Write(c);
                    break;
                case '"':
                    state = "outside";
                    break;
                default:
                    state = "insidedouble";
                    Console.Write(c);
                    break;
            }
            break;  

        case "insidesingle":
            switch (c)
            {
                case '\'':
                    state = "outside";
                    break;
                case '"':
                    state = "insidesingle";
                    break;
                default:
                    state = "insidesingle";
                    break;
            }
            break;
    }
}

Just for fun, I decided to solve this problem using a very lightweight FSM library called stateless.

Here's how the code would look like if you were to use this library.

Just like Eric's solution, below code can be changed easily to meet new requirements.

void Main()
{
    Console.WriteLine(string.Join("", GetCharacters(@"1""2'3""4""5'6""7'8""9""A'B")));  
}

public enum CharacterType
{
    Other,
    SingleQuote,
    DoubleQuote
}

public enum State
{
    OutsideQuote,
    InsideSingleQuote,
    InsideDoubleQuote
}

public IEnumerable<char> GetCharacters(string input)
{
    //Initial state of the machine is OutSideQuote.
    var sm = new StateMachine<State, CharacterType>(State.OutsideQuote);

    //Below, we configure state transitions.
    //For a given state, we configure how CharacterType 
    //transitions state machine to a new state.

    sm.Configure(State.OutsideQuote)
        .Ignore(CharacterType.Other)        
        //If you are outside quote and you receive a double quote, 
        //state transitions to InsideDoubleQuote.
        .Permit(CharacterType.DoubleQuote, State.InsideDoubleQuote)
        //If you are outside quote and you receive a single quote,
        //state transitions to InsideSingleQuote.
        //Same logic applies for other state transitions below.
        .Permit(CharacterType.SingleQuote, State.InsideSingleQuote);

    sm.Configure(State.InsideDoubleQuote)
        .Ignore(CharacterType.Other)
        .Ignore(CharacterType.SingleQuote)
        .Permit(CharacterType.DoubleQuote, State.OutsideQuote);

    sm.Configure(State.InsideSingleQuote)
        .Ignore(CharacterType.Other)
        .Ignore(CharacterType.DoubleQuote)
        .Permit(CharacterType.SingleQuote, State.OutsideQuote);

    foreach (var character in input)
    {
        var characterType = GetCharacterType(character);
        sm.Fire(characterType);
        if(sm.IsInState(State.InsideDoubleQuote) && characterType != CharacterType.DoubleQuote)
            yield return character;
    }       

}

public CharacterType GetCharacterType(char input)
{
    switch (input)
    {
        case '\'': return CharacterType.SingleQuote;
        case '\"': return CharacterType.DoubleQuote;
        default: return CharacterType.Other;
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top