Question

I'm designing a library built on web scraping that tries to provide an API to a popular news site. I am representing each of its articles as a collection of 'elements' (IElement), such as images, videos, blocks of text, soundtracks, etc. The problem is, I can't think of anything each 'element' has in common.

  • Text? Nope, images (and videos, and soundtracks) don't have text.
  • A URL to access the file? Nope; while I will be visiting a remote URL to access an image or a video (since it's not practical to pass those around), I don't need to go to a URL to get the text (I can just pass it as a string).

The only thing I can think of is having a member that describes what type the element is, but that could potentially lead to a very unintuitive API. I don't want my API to end up looking like this:

// API code
enum ElementType
{
    Text, Hyperlink, Image, Video, Soundtrack
}

interface IArticle
{
    IEnumerable<IElement> Contents { get; }
}

interface IElement
{
    ElementType Type { get; }
}

class TextElement : IElement { /*details*/ }
class ImageElement : IElement { /*details*/ }

// .. and so on

// Usage in app code
foreach (var element in article.Contents)
{
    switch (element.Type)
    {
        case ElementType.Text:
            RenderText(((TextElement)element).Text);
            break;
        case ElementType.Image:
            DisplayImage(GetImageFromUri(((ImageElement)element).ImageUri));
            break;
        // and so on
    }
}

As you can see, a lot of verbosity is added in this scenario because the user has to 1) switch on the element's type, 2) check for each ElementType, and then 3) downcast it to use implementation-specific features.

Is there an alternative to messy runtime type checking / downcasting in this scenario? How do other client apps (i.e. Facebook / Twitter clients) handle this kind of problem?

Was it helpful?

Solution

Part of your problem is that you are trying to reinvent the HTML document object model. Unless the content needs some special processing, it would be better to supply HTML fragments.

The type switch can be eliminated by using the visitor pattern: we define a visitor interface IElementVisitor that lists all cases, and add a void Accept(IElementVisitor) method to the IElement interface that will be implemented in each concrete type as

void Accept(IElementVisitor v) {
  v.visit(this);
}

Then, a RenderVisitor would look like

class RenderVisitor : IElementVisitor {
  void Render(IElement e) { e.Accept(this); }
  void Visit(TextElement e) {
    RenderText(e);
  }
  void Visit(ImageElement e) {
    RenderImage(e)
  }
  ...
}

Instead of the switch, we would now have new RenderVisitor().Render(element);.

OTHER TIPS

Use OOP, if you can. When you call an interface method or a base class method that is overridden, the language will cast the object to the implementing type and provide it as the this pointer. You won't need casting or a switch statement.

In this case, you might have a method like public void DoIt (), which is implemented in each class differently. The code in your switch cases moves into the specific subtypes' implementation of DoIt ()

Licensed under: CC-BY-SA with attribution
scroll top