Pregunta

I am writing a class that parses text from a given file. There are a few different "types" of text, and the parsing rules differ for each.

For example, one type of text, we will just call "Plain Text", is a simple string for which the parser strips out extra whitespace. So for example, if I had "The quick brown fox\r\njumped over the lazy brown dogs\r\n", the parser would simply just return "The quick brown fox jumped over the lazy brown dogs" (line breaks converted to single spaces).

Other text is representative of a table with a given delimiter, so it may look like "First Name,Last Name,DOB", and the parser's job is to return an array containing each comma-separated value.

(The actual implementation is more complex than this but this is a good simplification).

Originally I was going to go about this by creating an enum named something like TextType, with values PlainText and TableText. Then I could have a method that would look like

public string ParseText(string textToParse, TextType textType)

I quickly realized this doesn't work because when textType is PlainText the return value should be a string, but when textType is TableText the return value should be a string[].

One option would be to always return a string[] and just have it as a given that PlainText will always return an array of size 1. I'm not too thrilled with this though because it just doesn't seem semantically correct and could be confusing.

Another option would be to write a method for each TextType, so I could have

public string ParsePlainText(string textToParse)

and

public string[] ParseTableText(string textToParse)

The reason I don't like this approach is because it removes some of the flexibility provided by the original approach with the enum. For example, it's expected that I will be adding additional text types later; in the future I might have a type of text that the client wishes to identify as say, HeadingText, but will be parsed the same way as plain text. With the original approach, the public interface of the class containing the parsing method wouldn't have to change, because I could just add a new value to the TextType enum and modify the internals of the ParseText method. Additionally I think it's just a much cleaner interface when there is only one method to call, and the client simply has to pass the TextType (which he knows) and everything else is handled for him (versus having to pick from a list of similarly named methods that grows each time a new text type is added).

Finally I could just return an object that both string and string[] inherit from (since this is C#, I can just return object), and have the client cast to the appropriate type. I think this is the worst approach of all because it requires the client to know what should "actually" be returned, and has a huge potential for someone breaking every dependency by changing a type returned from the Parse class and not encountering resulting errors until runtime (since there's essentially no type checking to begin with).

Is there a "right" or optimal approach for this situation?

¿Fue útil?

Solución 2

Let's try to clarify the problem a bit. You have two types of texts (at least for now, that will likely grow to more types), that:

  1. Have the same input type, String.
  2. Have different output types, String and an Array.
  3. Have different implementation.

The future added types may require a different types of output/return. Now the question is, is it really logical to try to combine these functionality into one method? I see that your goal is to provide a uniform/generic interface to the clients, but, if the the return types are different, I am not sure if you can provide such an interface.

I don't think having many similar methods in an interface is a bad thing. There are many well-known libraries that are this way.

In my opinion, your second approach, having something like

public string ParsePlainText(string textToParse)
public string[] ParseTableText(string textToParse)

despite your suspicion, is a much cleaner way, than the other suggested ways. Especially, if you look at it from the clients view (compared to case where an array of size 1 is return, or casting used, etc).

Otros consejos

There's a few ways.. but the first that comes to mind is using interfaces:

interface ITextParser {
    string Parse(string text);
}

public class TableTextParser : ITextParser {
    public string Parse(string text) {
        // specific table parsing stuff here
    }
}

public class PlainTextParser : ITextParser {
    public string Parse(string text) {
        // specific plain text parsing stuff here
    }
}

Your main function could then become a factory of sorts, like this:

public ITextParser CreateParserFor(string textToParse) {
    // logic here to determine the sort of parser you require:
    if (typeOfTextIsTable)
        return new TableTextParser();

    if (typeOfTextIsPlain)
        return new PlainTextParser();
}

Then you can perhaps call it like so:

var parser = CreateParserFor(string_here);
var result = parser.Parse(string_here);

For any of the approaches suggested, you could return a ParseResults object from the Parse method representing a collection of results. This can then expose an iterator to iterate through the ParseResults.

This keeps your signature uniform and IMO does not confuse.

Hope this helps.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top