Conversion system for variants

https://stackoverflow.com/questions/5900740

29-10-2019
|

题

I have written a variant class, which will be used as the main type in a dynamic language, that will ultimately allow 256 different types of value (header is an unsigned byte, only 20 are actually used). I now want to implement casting/converting between types.

My initial thought was a lookup table, but the shear amount of memory that would need makes it impractical to implement.

What are the alternatives? Right now I am considering a further three methods from research and suggestions from other people:

Group the types into larger subsets, such as numeric or collection or other.
Make a conversion interface that has CanCast(from, to) and Cast(Variant) methods and allow classes that implement that interface to be added to a list, that can then be checked to see if any of the conversion classes can do the cast.
Similar to (1) but make several master types, and casting is a two step process from the original type to the master type and then again to the final type.

What would be the best system?

Edit: I have added the bounty as I am still unsure on the best system, the current answer is very good, and definitely got my +1 but there must be people out there who have done this and can say what the best method is.

解决方案

My system is very "heavy" (lots of code), but very fast, and very feature rich (cross-platform C++). I'm not sure how far you would want to go with your design, but here's the biggest parts of what I did:

DatumState - Class holding an "enum" for "type", plus native value, which is a "union" among all the primitive types, including void*. This class is uncoupled from all types, and can be used for any native/primitive types, and "reference to" void* type. Since the "enum" also has "VALUE_OF" and "REF_TO" context, this class can present as "wholly containing" a float (or some primitive type), or "referencing-but-not-owning" a float (or some primitive type). (I actually have "VALUE_OF", "REF_TO", and "PTR_TO" contexts so I can logically store a value, a reference-that-cannot-be-null, or a pointer-that-may-be-null-or-not, and which I know I need to delete-or-not.)

Datum - Class wholly containing a DatumState, but which expands its interface to accommodate various "well-known" types (like MyDate, MyColor, MyFileName, etc.) These well-known types are actually stored in the void* inside the DatumState member. However, because the "enum" portion of the DatumState has the "VALUE_OF" and "REF_TO" context, it can represent a "pointer-to-MyDate" or "value-of-MyDate".

DatumStateHandle - A helper template class parameterized with a (well-known) type (like MyDate, MyColor, MyFileName, etc.) This is the accessor used by Datum to extract state from the well-known type. The default implementation works for most classes, but any class with specific semantics for access merely overrides its specific template parameterization/implementation for one or more member functions in this template class.

Macros, helper functions, and some other supporting stuff - To simplify "adding" of well-known types to my Datum/Variant, I found it convenient to centralize logic into a few macros, provide some support functions like operator overloading, and establish some other conventions in my code.

As a "side-effect" of this implementation, I got tons of benefits, including reference and value semantics, options for "null" on all types, and support for heterogeneous containers for all types.

For example, you can create a set of integers and index them:

int my_ints[10];
Datum d(my_ints, 10/*count*/);
for(long i = 0; i < d.count(); ++i)
{
  d[i] = i;
}

Similarly, some data types are indexed by strings, or by enums:

MyDate my_date = MyDate::GetDateToday();
Datum d(my_date);
cout << d["DAY_OF_WEEK"] << endl;
cout << d[MyDate::DAY_OF_WEEK] << endl; // alternative

I can store sets-of-items (natively), or sets-of-Datums (wrapping each item). For either case, I can "unwrap" recursively:

MyDate my_dates[10];
Datum d(my_dates, 10/*count*/);
for(long i = 0; i < d.count(); ++i)
{
  cout << d[i][MyDate::DAY_OF_WEEK] << endl;
}

One might argue my "REF_TO" and "VALUE_OF" semantics are overkill, but they were essential for the "set-unwrapping".

I've done this "Variant" thing with nine different designs, and my current is the "heaviest" (most code), but the one I like the best (almost the fastest with a pretty small object footprint), and I've deprecated the other eight designs for my use.

The "downsides" to my design are:

Objects are accessed through static_cast<>() from a void* (type-safe and fairly fast, but indirection is required; but, side-effect is that design supports storage of "null".)
Compiles are longer because of the well-known types that are exposed through the Datum interface (but you can use DatumState if you do not want well-known type APIs).

No matter your design, I'd recommend the following:

Use an "enum" or something to tell you the "type", separate from the "value". (I know you can compress them into one "int" or something with bit packing, but that is slow-for-access and very tricky to maintain as new types are introduced.)
Lean on templates or something to centralize operations, with a mechanism for type-specific (override) processing (assuming you want to handle non-trivial types).

The name-of-the-game is "simplified maintenance when adding new types" (or at least, it was for me). Like a good Term Paper, it is a very good idea if you rewrite, rewrite, rewrite, to hold-or-increase your functionality as you constantly remove the code required to maintain the system (e.g., minimize the effort required to adapt new types to your existing Variant infrastructure).

Good luck!

其他提示

Done something similar.

You could add another byte to the "header", indicating the type its really storing.

Example in a C-style programming language:

typedef
enum VariantInternalType {
  vtUnassigned = 0;
  vtByte = 1;
  vtCharPtr = 2; // <-- "plain c" string
  vtBool = 3;
  // other supported data types
}

// --> real data
typedef
struct VariantHeader {
  void* Reserved; // <-- your data (byte or void*)
  VariantInternalType VariantInternalType;  
}

// --> hides real data
typedef
  byte[sizeof(VariantHeader)] Variant;

// allocates & assign a byte data type to a variant
Variant ByteToVar(byte value)
{
  VariantHeader MyVariantHeader;
  Variant MyVariant;

  MyVariantHeader.VariantInternalType = VariantInternalType.vtByte;
  MyVariantHeader.Reserved = value;  

  memcpy (&MyVariant, &MyVariantHeader, sizeof(Variant));

  return myVariant;
}

// allocates & assign a char array data type to a variant
Variant CharPtrToVar(char* value)
{
  VariantHeader MyVariantHeader;
  Variant MyVariant;

  MyVariantHeader.VariantInternalType = VariantInternalType.vtByte;
  MyVariantHeader.Reserved = strcpy(value);  

  // copy exposed struct type data to hidden array data
  memcpy(&MyVariant, &MyVariantHeader, sizeof(Variant));

  return myVariant;
}

// deallocs memory for any internal data type
void freeVar(Variant &myVariant)
{
  VariantHeader MyVariantHeader;

  // copy exposed struct type data to hidden array data
  memcpy(&MyVariantHeader, &MyVariant, sizeof(VariantHeader));

  switch (MyVariantHeader.VariantInternalType) {
    case vtCharPtr:
      strfree(MyVariantHeader.reserved);
    break;

    // other types

    default:
    break;
  }

  // copy exposed struct type data to hidden array data
  memcpy(&MyVariant, &MyVariantHeader, sizeof(Variant));
}

bool isVariantType(Variant &thisVariant, VariantInternalType thisType)
{
  VariantHeader MyVariantHeader;

  // copy exposed struct type data to hidden array data
  memcpy(&MyVariantHeader, &MyVariant, sizeof(VariantHeader));

  return (MyVariant.VariantInternalType == thisType);
}

// -------

void main()
{
  Variant myVariantStr = CharPtrToVar("Hello World");
  Variant myVariantByte = ByteToVar(42);

  char* myString = null;
  byte  myByte = 0;

  if isVariantType(myVariantStr, vtCharPtr) {
    myString = VarToCharPtr(myVariantStr);
    // print variant string into screen
  }

  // ...    
}

This is only a suggestion, and its not tested.

Maybe you've already done the calculation, but the amount of memory you need for the lookup table isn't that much.

If you just need to check if types are compatible, then you need (256*256)/2 bits. This requires 4k of memory.

If you also need a pointer to a conversion function, then you need (256*256)/2 pointers. This requires 128k of memory on a 32-bit machine and 256k on a 64-bit machine. If you're willing to do some low-level address layout, you can probably get that down to 64k on both 32-bit and 64-bit machines.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow