Deriving from a base class whose instances reside in a fixed format (database, MMF)…how to be safe?

https://stackoverflow.com/questions/11167483

16-06-2021
|

Question

(Note: I'm looking for really any suggestions on the right search terms to read up on this category of issue. "Object-relational-mapping" occurred to me as a place where I could find some good prior art...but I haven't seen anything quite fitting this scenario just yet.)

I have a very generic class Node, which for the moment you can think of as being a bit like an element in a DOM tree. This is not precisely what's going on--they're graph database objects in a memory mapped file. But the analogy is fairly close for all practical purposes, so I'll stick to DOM terms for simplicity.

The "tag" embedded in the node implies a certain set of operations you should (ideally) be able to do with it. Right now I'm using derived classes to do this. So for instance, if you were trying to represent something like an HTML list:

<ul>
   <li>Coffee</li>
   <li>Tea</li>
   <li>Milk</li>
</ul>

The underlying tree would be seven nodes:

+--UL                       // Node #1
   +--LI                    // Node #2
      +--String(Coffee)     // Node #3 (literal text)
   +--LI                    // Node #4
      +--String(Tea)        // Node #5 (literal text)
   +--LI                    // Node #6
      +--String(Milk)       // Node #7 (literal text)

Since getString() is already a primitive method on Nodes themselves, I'd probably only make class UnorderedListNode : public Node, class ListItemNode : public Node.

Continuing this hypothetical, let's imagine I wanted to help the programmer use less general functions when they know more about the Node "type"/tag they have in their hands. Perhaps I want to assist them with structural idioms on the tree, like adding a string item to an unordered list, or extracting things as a string. (This is just an analogy so don't take the routines too seriously.)

class UnorderedListNode : public Node {
private:
    // Any data members someone put here would be a mistake!

public:
    static boost::optional<UnorderedListNode&> maybeCastFromNode(Node& node) {
        if (node.tagString() == "ul") {
            return reinterpret_cast<UnorderedListNode&>(node);
        }
        return boost::none;
    }

    // a const helper method
    vector<string> getListAsStrings() const {
        vector<string> result;
        for (Node const* childNode : children()) {
            result.push_back(childNode->children()[0]->getText());
        }
        return result;
    }

    // helper method requiring mutable object
    void addStringToList(std::string listItemString) {
        unique_ptr<Node> liNode (new Node (Tag ("LI"));
        unique_ptr<Node> textNode (new Node (listItemString));
        liNode->addChild(std::move(textNode));
        addChild(std::move(liNode));
    }
};

Adding data members to these new derived classes is a bad idea. The only way to really persist any information is to use the foundational routines of Node (for instance, the addChild call above, or getText) to interact with the tree. Thus the real inheritance model--to the extent one exists--is outside of the C++ type system. What makes a <UL> node "maybeCast" into an UnorderedListNode has nothing to do with vtables/etc.

C++ inheritance looks right sometimes, but feels wrong usually. I feel like instead of inheritance I should have classes that exist independently of Node, and just collaborate with it somehow as "accessor helpers"...but I don't have a good grasp of what that would be like.

Solution 2

"C++ inheritance looks right sometimes, but feels wrong usually."

Indeed, and this statement is worrisome:

What makes a node "maybeCast" into an UnorderedListNode has nothing to do with vtables/etc.

As is this code:

static boost::optional<UnorderedListNode&> maybeCastFromNode(Node& node) {
    if (tagString() == "ul") {
        return reinterpret_cast<UnorderedListNode&>(node);
    }
    return boost::none;
}

(1) type-punning

If the Node& being passed in was allocated through a mechanism that did not legally and properly construct an UnorderedListNode on the inheritance path, this is what is called type punning. It's almost always a bad idea. Even if the memory layout on most compilers appears to work when there are no virtual functions and derived classes add no data members, they are free to break it in most all circumstances.

(2) strict-alias

Next there is the compiler's assumption that pointers to objects of fundamentally different types do not "alias" each other. This is the strict aliasing requirement. Although it can be disabled via non-standard extensions, that should only be applied in legacy situations...it hinders optimization.

It's not completely clear--from an academic standpoint--whether these two hindrances have workarounds permitted by the spec under special circumstances. Here's a question which investigates that, and is still an open discussion at time of writing:

Make interchangeable class types via pointer casting only, without having to allocate any new objects?

But to quote @MatthieuM.: "The closer you get to the edges of the specifications, the more likely you are to hit a compiler bug. So, as engineer, I advise to be pragmatic and avoid playing mind games with your compiler; whether you are right or wrong is irrelevant: when you get a crash in production code, you lose, not the compiler writers."

This is probably more the right track:

I feel like instead of inheritance I should have classes that exist independently of Node, and just collaborate with it somehow as "accessor helpers"...but I don't have a good grasp of what that would be like.

Using Design Pattern terms, this matches something like a Proxy. You would have a lightweight object that stores the pointer and is then passed around by value. In practice, it can be tricky to handle issues like what to do about the const-ness of the incoming pointers being wrapped!

Here's a sample of how it might be done relatively simply for this case. First, a definition for the Accessor base class:

template<class AccessorType> class Wrapper;

class Accessor {
private:
    mutable Node * nodePtrDoNotUseDirectly;
template<class AccessorType> friend class Wrapper;
    void setNodePtr(Node * newNodePtr) {
        nodePtrDoNotUseDirectly = newNodePtr;
    }
    void setNodePtr(Node const * newNodePtr) const {
        nodePtrDoNotUseDirectly = const_cast<Node *>(newNodePtr);
    }
    Node & getNode() { return *nodePtrDoNotUseDirectly; }
    Node const & getNode() const { return *nodePtrDoNotUseDirectly; }

protected:
    Accessor() {}

public:
    // These functions should match Node's public interface
    // Library maintainer must maintain these, but oh well
    inline void addChild(unique_ptr<Node>&& child)) { 
        getNode().addChild(std::move(child));
    }
    inline string getText() const { return getNode().getText(); }
    // ...
};

Then, a partial template specialization for handling the case of wrapping a "const Accessor", which is how to signal that it will be receiving a const Node &:

template<class AccessorType>
class Wrapper<AccessorType const> {    
protected:
    AccessorType accessorDoNotUseDirectly;
private:
    inline AccessorType const & getAccessor() const {
        return accessorDoNotUseDirectly;
    }

public:
    Wrapper () = delete;
    Wrapper (Node const & node) { getAccessor().setNodePtr(&node); }
    AccessorType const * operator-> const () { return &getAccessor(); }
    virtual ~Wrapper () { }
};

The Wrapper for the "mutable Accessor" case inherits from its own partial template specialization. This way the inheritance provides for the proper coercions and assignments...prohibiting the assignment of a const to a non-const, but working the other way around:

template<class AccessorType>
class Wrapper : public Wrapper<AccessorType const> {
private:
    inline AccessorType & getAccessor() {
        return Wrapper<AccessorType const>::accessorDoNotUseDirectly;
    }

public:
    Wrapper () = delete;
    Wrapper (Node & node) : Wrapper<AccessorType const> (node) { }
    AccessorType * operator-> () { return &Wrapper::getAccessor(); }
    virtual ~Wrapper() { }
};

A compiling implementation with test code and with comments documenting the weird parts is in a Gist here.

Sources: @MatthieuM., @PaulGroke

OTHER TIPS

I am not sure I have understood completely what you intend to do but here are some suggestions you might find useful.

You are definitely on the right track with inheritance. All the UL nodes, LI nodes, ... etc. are Node-s. Perfect "is_a" relationship, you should derive these classes from the Node class.

let's imagine I wanted to help the programmer use less general functions when they know more about the Node "type"/tag they have in their hands

...and this is what virtual functions are for.

Now for the maybeCastFromNode method. That's downcasting. Why would you do that? Maybe for deserializing? If yes, then I'd recommend using dynamic_cast<UnorderedListNode *> . Although most likely you won't need RTTI at all if the inheritance tree and the virtual methods are well-designed.

C++ inheritance looks right sometimes, but feels wrong usually.

This might not always be C++'s fault :-)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow