Question

I'm trying to fully understand the visitor pattern. What I've learnt so far (correct me if I'm wrong) is:

  • It's about adding operations to classes, without modifying the source code of those classes. Or put another way, to bend the OOP approach to have functions and data structures separated.
  • It's a common misunderstanding that it has to do with hierarchies of objects (although it can be very useful in that case).

I think I get it, but there is a thing that looks unnecessary to me, and that's the accept method in the classes "to be visited". Let's set up a small example in Java. First the class hierarchy to be enriched with operations, but it's not to be modified:

interface Animal {
    void accept(AnimalVisitor visitor);
}

class Dog implements Animal {
    void accept(AnimalVisitor visitor) {
        visitor.visitDog(this);
    }    
}

class Cat implements Animal {
    void accept(AnimalVisitor visitor) {
        visitor.visitCat(this);
    }    
}

Then the visitor interface and a dummy implementation of that interface, representing an operation to make some sound.

interface AnimalVisitor {
    // These methods could be just called "visit" and rely on overloading,
    void visitDog(Dog dog);
    void visitCat(Cat cat);
}

class MakeSoundVisitor implements AnimalVisitor {
    void visitDog(Dog dog) {
        // In a real case you'd obviously do something with the dog object
        System.out.println("bark! bark bark!!");
    }

    void visitCat(Cat cat) {
        System.out.println("meow meeeoooww!!");
    }
}

And then an usage of all of this would be:

var makeSoundVisitor = new MakeSoundVisitor();
var cat = new Cat();
var dog = new Dog();

cat.accept(makeSoundVisitor);
dog.accept(makeSoundVisitor);

But I really don't see the point of that accept call. If you've got the visitor and the objects to be visited, why not just pass these objects directly to the visitor and avoid the indirection? You could even get rid of the accept method on the Animal interface. Something like this:

var makeSoundVisitor = new MakeSoundVisitor();
var cat = new Cat();
var dog = new Dog();

makeSoundVisitor.visitCat(cat);
makeSoundVisitor.visitDog(dog);

Sources:

Was it helpful?

Solution

In your simple example, you know exactly the real type of the object on which you invoke the visitor and can therefore chose yourself the right visitor method:

makeSoundVisitor.visitCat(cat);      // You know that cat is a Cat
makeSoundVisitor.visitDog(dog);      // You know that dog is a Dog

But what if you don't know the type of the object? For example

Animal pet = getRandomAnimal();  

How would you now invoke your simplified visitor without the accept() method ? You'd probably need to find out the real type of pet first, and then call visitDog() or visitCat() with a downcast. This is all very cumbersome and error-prone.

With the classical visitor pattern, it's just the beauty of polymorphism that accept() allows:

pet.accept(makeSoundVisitor);

The underlying technique of double dispatch is worth to be known outside the visitor context.

OTHER TIPS

But I really don't see the point of that accept call. If you've got the visitor and the objects to be visited, why not just pass these objects directly to the visitor and avoid the indirection?

Christophe's answer is on point, I just want to expand on that. Not knowing the runtime type of the object is actually an assumption of the Visitor pattern. You can understand the pattern in two ways. The first one is that it's a trick to do multiple dispatch in a single-dispatch language. The other is that it's a way to do abstract data types in OOP languages. Let me explain.

You see, there are two major approaches to data abstraction 1. OOP achieves it by abstracting away procedure calls. As in, you are actually specifying an abstract operation when you're making the call (you're specifying "the message"), and the actual function you're calling is being resolved by some underlying mechanism. This underlying mechanism allows objects to respond to a certain interface (a set of public methods/messages), which makes it easy to add new representations (by subclassing), but harder to add new operations. Note that, when utilizing this sort of polymorphism, while the code that creates the objects knows concrete types, other client code is written in terms of the abstract type (and in case of OOP, that specifically means in terms of the interface defined by that abstract type).

The other approach is abstract data types (ADTs), where a finite set of representations (concrete data types) is abstracted away and treated as a single data type. In contrast to OOP, you're now calling concrete functions, but you're passing in a data abstraction. I.e., the parameter type is never concrete, and client code never works with or has knowledge of concrete representations (except at construction sites, but the same is true for OOP). There's an underlying mechanism that allows functions to identify (or match to) a concrete type, and each operation must support all representations (or, in terms of the Visitor pattern, each concrete Visitor must handle all Element types). At the simplest form it's something like a switch statement, in functional languages it manifests as pattern matching, and in the Visitor pattern it's encoded in the abstract Visitor interface (an abstract visit method for each possible element type) that each derivative must support in a meaningful way. The tradeoffs for this kind of data abstraction are the other way around - it's easy to add new operations, but it's hard to add new representations (new element types).

So, with that in mind, the Visitor pattern is good for scenarios where you can expect the operations to change more frequently compared to representations, i.e., scenarios where the number of different element types is expected to be finite and relatively stable.

I've noticed that you've linked to a page called "Crafting Interpreters: The visitor pattern". The use case there demonstrates this idea - the underlying data structure is an expression tree, which consists of nodes that can be represented in different ways (have different data types). There's a finite number of representations (defined by the rules of the language), but they are all rolled into an abstract data type representing an expression tree (Expr). You can then define a number of concrete visitors representing different generalized operations that can be applied to that tree. The external (client-facing) interface of each visitor only uses the abstract type, Expr, which then lets you write client code only in terms of this abstraction (i.e., client code doesn't have to know the concrete types of each node, just that it's an expression tree, and that there's a number of operations that can be applied to it). I know that the examples there construct the tree right before it's used, but a more realistic scenario is reading some code from a file and returning an abstract syntax tree.

Interestingly, in that article, the Visitor pattern is kind of implemented backwards; their example of client code is:

new AstPrinter().print(expression)

whereas it should be:

expression.accept(new AstPrinter())

since AstPrinter is the "visiting" operation (but then the method of extracting the result from the AstPrinter would be different).

If you find the accept/visit naming confusing, you can mentally rename these methods:

element.accept(visitor)   

// can be seen as: 

abstractType.do(operation)

and

visitor.visit(this)

// can be seen as: 

operation.applyTo(concreteType)   

An important thing to realize is that the Visitor interface (the various visit overloads) are meant to be treated as internal to the type abstraction (in other words, they are there to (1) be called internally by concrete elements, and (2) to be implemented by Visitor-derivatives; they are not meant to be used by client code).


1 The two approaches involve different tradeoffs; this is known in the CS community as the "expression problem".

Like the other answers, I have to admit Christophe's answer is spot on, but there's some confusion around why one might want to getRandomAnimal().

The frustrating reality is that very few books that show the visitor pattern bother showing the most important reason you use it: often the code that constructs your objects knows the real type of the object, but the rest does not.

One very simple example:

var allObjects = new List<GameObject>(); // construct a list of game objects
populateObjects(allObjects); // some game configuration

while (true) {
    var updateVisitor = new ObjectUpdateVisitor();
    for (var object: allObjects) {
        object.accept(updateVisitor);
    }
}

In this case, some early configuration code knew the real types of the objects, but we forgot about them along the way because we wanted to simplify the code. We didn't want all of the rest of the code to have to know what all of the objects are. We just wanted to throw them in a pile and act on each of them!

It can be hard to see this in toy examples, like what most books show. However, in practice, this sort of abstraction occurs all the time.

While other answers focus mostly on polymorphism, I think it's important to answer one of the specific questions you've presented.

It's about adding operations to classes, without modifying the source code of those classes. Or put another way, to bend the OOP approach to have functions and data structures separated.

This isn't necessarily true. Visitors can be stateful, and therefore can track their internal state. You may have visitors which are used for data processing over a collection of visit-able classes.

Consider the following AnimalStatsVisitor implementation:

class AnimalStatsVisitor implements AnimalVisitor {
    private long catsCount = 0;
    private long dogsCount = 0;

    public void visitDot(Dog dog) {
        dogsCount++;
    }

    public void visitCat(Cat cat) {
        catsCount++;
    }

    public void printStats() {
        System.out.printLn(
            "Found " + dogsCount + " dog(s) and " + catsCount + " cat(s)."
        )
    }
}

which is the used as the following:

List<Animal> animals = Arrays.asList(
    new Cat(),
    new Cat(),
    new Dog(),
    new Dog(),
    new Dog()
);

AnimalStatsVisitor visitor = new AnimalStatsVisitor();

animals.forEach(animal -> animal.accept(visitor));

visitor.printStats();

As you can see, both the logic and the data lives together in the visitor, it's not separated. Sure, it's not a logic of either a Cat and/or a Dog, but this aggregation logic shouldn't live in there anyway. The visitor is a perfect place to put it in.

Dear StackExchange readers:

You have called upon this post to provide an answer to the question. And I want to write this methodanswer to be personalized to you.

It's clear that you're a StackExchangeReader – but, that's an abstract type with many sub-types. So, what kind of StackExchangeReader are you?

Instead of writing a generic answer based on only the information that you're a StackExchangeReader, I'll instead ask you to implement additional functionality described as an .Accept() method. When you Accept(), you will again call back to this general methodanswer but in a manner that reflects who you truly are, so that this methodanswer may better implement appropriate behavior.


Source code for .Accept().

Language: English.

Reader: To .Accept(), please call back to the following method that best describes what specific kind of StackExchangeReader you are:

WARNING: If you refuse to call back to the most appropriate overload, then you have not implemented the .Accept() method required of the visitor-pattern. As such, you calling this methodanswer is a type-error.


Explanation.

The point of the above is to demonstrate the visitor-pattern, including the .Accept()-method, in concrete terms here.

Specifically, when we write answers on StackExchange without knowing who'll read them, we have to write the answers to a generic StackExchangeReader. But if you want to personalize answers to specific sub-types of StackExchangeReader's, then you can do so by asking readers to self-select which type best fits them.

And that's the point of the .Accept() method: it accepts the visitor, which calls back to the caller. The magic comes from the .Accept() method knowing the type.

For example, in this case, the .Accept() method takes the form of a StackExchangerReader agreeing to select the link to the answer that works best for them. So while it might be hard to know what kind of reader'll be calling this answer in advance, the visitor-pattern can address this when readers are willing to self-select from a list of options (which're various overloads that they can call back to).

accept is a statically type-safe way to permit an if-ladder based on something's type.

if ( thing instanceof Foo ) {
    Foo foo = ( Foo )thing;
    BODY1
} else if ( thing instanceof Bar ) {
    Bar bar = ( Bar )thing;
    BODY2
} else if ...

becomes

new ThingVisitor() {
    void ifThingInstanceOfFoo( Foo foo ) {
        BODY1
    }
    void elseIfThingInstanceOfBar( Bar bar ) {
        BODY2
    }
    ...
}

The only way that can work and not rely on casting is if the "implementation" of if, the selection of which visitor method to call, lives in a polymorphic accept( Thing thing ).

Licensed under: CC-BY-SA with attribution
scroll top