Вопрос

Here's the problem:

I am currently trying to create a simple stack-based programming language (Reverse Polish Notation, FORTH style) as a component of a larger project. I have hit a snag, though.

There is no problem with creating a stack in C++ (by using std::vector<>) that would contain one type of element (I could use the syntax std::vector<double> Stack, for instance).

However, a programming language needs to be able to hold multiple data types, such as ints, doubles, strings, and 3D vectors (as in physics vectors with X, Y, and Z components), just to name some simple things.

So, is there a construct in C++ that I could use as a stack that would be able to store more than one kind of primitive type/object/struct?

Это было полезно?

Решение

Sure, one way is to use a tagged union:

enum Type { INTEGER, DOUBLE, /* ... */ };

union Data {
    uint64_t as_integer;
    double as_double;
    // ...
};

struct Value {
    Type type;
    Data data;
};

The storage for as_integer, as_double, etc. will be overlapped, so a Value structure will take up two words of storage, and your stack will have type std::vector<Value>. Then you access members of data according to the value of type:

void sub(std::vector<Value>& stack) {
    // In reality you would probably factor this pattern into a function.
    auto b = stack.back();
    stack.pop_back();
    assert(b.type == INTEGER);

    auto a = stack.back();
    stack.pop_back();
    assert(a.type == INTEGER);

    Value result;
    result.type = INTEGER;
    result.data.as_integer = a.data.as_integer - b.data.as_integer;
    stack.push_back(result);
}

Of course, Forths are usually untyped, meaning that the stack consists of words only (std::vector<uint64_t>) and the interpretation of a data value is up to the word operating on it. In that case, you would pun via a union or reinterpret_cast to the appropriate type in the implementation of each word:

void subDouble(std::vector<Data>& stack) {
    // Note that this has no type safety guarantees anymore.
    double b = stack.back().as_double;
    stack.pop_back();

    double a = stack.back().as_double;
    stack.pop_back();

    Data result;
    result.as_double = a - b;
    stack.push_back(result);
}

void subDouble(std::vector<uint64_t>& stack) {
    double b = reinterpret_cast<double&>(stack.back());
    stack.pop_back();

    double a = reinterpret_cast<double&>(stack.back());
    stack.pop_back();

    double result = a - b;
    stack.push_back(reinterpret_cast<uint64_t&>(result));
}

Alternatively, you can store not values but pointers to instances of a class Value from which other value types such as Integer or Double would derive:

struct Value {};
struct Integer : Value { uint64_t value; };
struct Double : Value { double value; };
// ...

Your stack would have type std::vector<unique_ptr<Value>> or std::vector<Value*>. Then you needn’t worry about different value sizes, at the cost of making wrapper structures and allocating instances of them at runtime.

Другие советы

I would suggest to use the inheritance. Make common base class for the objects you need to store, and make a vector of base types. The store all the inheriting objects in this vector.

Since c++ is an object-orientated language, you might just use inheritance. Here is a quick example taken from http://www.cplusplus.com/forum/general/17754/ and extended:

#include <iostream>
#include <vector>
using namespace std;

// abstract base class
class Animal
{
public:
    // pure virtual method
    virtual void speak() = 0;
    // virtual destructor
    virtual ~Animal() {}
};

// derived class 1
class Dog : public Animal
{
public:
    // polymorphic implementation of speak
    virtual void speak() { cout << "Ruff!"; }
};

// derived class 2
class Cat : public Animal
{
public:
    // polymorphic implementation of speak
    virtual void speak() { cout << "Meow!"; }
};

int main( int argc, char* args[] )

    // container of base class pointers
    vector<Animal*> barn;

    // dynamically allocate an Animal instance and add it to the container
    barn.push_back( new Dog() );
    barn.push_back( new Cat() );

    // invoke the speak method of the first Animal in the container
    barn.front()->speak();

    // invoke all speak methods and free the allocated memory
    for( vector<Animal*>::iterator i = barn.begin(); i != barn.end(); ++i )
    {
        i->speak();
        delete *i;
    }
    // empty the container
    barn.clear();

    return 0;
}

The solution for storing different types is a tagged union

enum Type { INT, STRING, DOUBLE, POINT2D, VECTOR, OBJECT... };

union Data {
    int int_val;
    double double_val;
    struct point2D { int x, int y };
    struct { int v3, int v2, int v1, int v0 }; // you can even use unnamed structs
    // ...
};

struct StackElem {
    Type type;
    Data data;
};

In C++ it's even better to use std::variant (or boost::variant in older C++ standards), which might use a tagged union under the hood

However there's no need to use a single stack for all when using the reverse Polish notation. You can use a value stack and a separate operator stack. For every operator on the operator stack you pop the corresponding number of parameters from the value stack. That'll make things easier and save memory since you can use a small char array for operators (unless you need more than 255 operators), and no memory wasted for saving the type as well as the bigger-than-needed data field in the struct like above. That means you don't need a type OPERATOR in the Type enum

You can use a double type stack for all numeric types because a double can contain all int type's range without loss of precision. That's what implemented in Javascript and Lua. If the operator needs more than 1 parameter then just push/pop all of them just like what a compiler does when evaluating a function. You don't need to worry about int operations anymore, just do everything in double, unless there are specific int operators. But you may need different operators for different types, for example + for double addition, p or something like that for vector addition. However if you need 64-bit int then a separate integer type is needed

For example if you need to add 2 3D vectors, push 3 dimensions of the first vector, then the other. When you pop out a vector operator from the operator stack, pop 3 dimensions of the 2 vectors from value stack. After doing the math on it, push resulting 3 dimensions to stack. No need for a vector type.

If you don't want to store int as double then you can use NaN-boxing (or nunboxing/punboxing) like Firefox's JS engine, in which if the value is int then the upper 16 of 64 bits are 1s, otherwise it's double (or pointer, which you probable wouldn't use). Another way is type tag in 3 lower bits in old FFJS engines. In this case it's a little bit complicated but you can use the same operator for every type. For more information about this read Using the extra 16 bits in 64-bit pointers

You can even use a byte array for storing all data types and read the correct number of bytes specified by the operator. For example if the operator indicated that the next operand must be an int, just read 4 bytes. If it's a string, read the 4 bytes of string length first then the string content from the stack. If it's a 2D point of int read 4 bytes of x and 4 bytes of y. If it's a double read 8 bytes, etc. This is the most space efficient way, but obviously it must be traded by speed

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top