Question

I would like to generate various data types in C++ with unique deterministic names. For example:

struct struct_int_double { int mem0; double mem1; };

At present my compiler synthesises names using a counter, which means the names don't agree when compiling the same data type in distinct translation units.

Here's what won't work:

  1. Using the ABI mangled_name function. Because it depends already on structs having unique names. Might work in C++11 compliant ABI by pretending struct is anonymous?

  2. Templates eg struct2 because templates don't work with recursive types.

  3. A complete mangling. Because it gives names which are way too long (hundreds of characters!)

Apart from a global registry (YUK!) the only thing I can think of is to first create a unique long mangled name, and then use a digest or hash function to shorten it (and hope there are no clashes).

Actual problem: to generate libraries which can be called where the types are anonymous, eg tuples, sum types, function types.

Any other ideas?

EDIT: Addition description of recursive type problem. Consider defining a linked list like this:

template<class T>
typedef pair<list<T>*, T> list;

This is actually what is required. It doesn't work for two reasons: first, you can't template a typedef. [NO, you can NOT use a template class with a typedef in it, it doesn't work] Second, you can't pass in list* as an argument because it isn't defined yet. In C without polymorphism you can do it:

struct list_int { struct list_int *next; int value; };

There are several work arounds. For this particular problem you can use a variant of the Barton-Nackman trick, but it doesn't generalise.

There is a general workaround, first shown me by Gabrielle des Rois, using a template with open recursion, and then a partial specialisation to close it. But this is extremely difficult to generate and would probably be unreadable even if I could figure out how to do it.

There's another problem doing variants properly too, but that's not directly related (it's just worse because of the stupid restriction against declaring unions with constructable types).

Therefore, my compiler simply uses ordinary C types. It has to handle polymorphism anyhow: one of the reasons for writing it was to bypass the problems of C++ type system including templates. This then leads to the naming problem.

Was it helpful?

Solution

Do you actually need the names to agree? Just define the structs separately, with different names, in the different translation units and reinterpret_cast<> where necessary to keep the C++ compiler happy. Of course that would be horrific in hand-written code, but this is code generated by your compiler, so you can (and I assume do) perform the necessary static type checks before the C++ code is generated.

If I've missed something and you really do need the type names to agree, then I think you already answered your own question: Unless the compiler can share information between the translation of multiple translation units (through some global registry), I can't see any way of generating unique, deterministic names from the type's structural form except the obvious one of name-mangling.

As for the length of names, I'm not sure why it matters? If you're considering using a hash function to shorten the names then clearly you don't need them to be human-readable, so why do they need to be short?

Personally I'd probably generate semi-human-readable names, in a similar style to existing name-mangling schemes, and not bother with the hash function. So, instead of generating struct_int_double you might generate sid (struct, int, double) or si32f64 (struct, 32-bit integer, 64-bit float) or whatever. Names like that have the advantage that they can still be parsed directly (which seems like it would be pretty much essential for debugging).

Edit

Some more thoughts:

  • Templates: I don't see any real advantage in generating template code to get around this problem, even if it were possible. If you're worried about hitting symbol name length limits in the linker, templates can't help you, because the linker has no concept of templates: any symbols it see will be mangled forms of the template structure generated by the C++ compiler and will have exactly the same problem as long mangled names generated directly by the felix compiler.
  • Any types that have been named in felix code should be retained and used directly (or nearly directly) in the generated C++ code. I would think there are practical (soft) readability/maintainability constraints on the complexity of anonymous types used in felix code, which are the only ones you need to generate names for. I assume your "variants" are discriminated unions, so each component part must have a name (the tag) defined in the felix code, and again these names can be retained. (I mentioned this in a comment, but since I'm editing my answer I might as well include it)
  • Reducing mangled-name length: Running a long mangled name through a hash function sounds like the easiest way to do it, and the chance of collisions should be acceptable as long as you use a good hash function and retain enough bits in your hashed name (and your alphabet for encoding the hashed name has 37 characters, so a full 160-bit sha1 hash could be written in about 31 characters). The hash function idea means that you won't be able to get directly back from a hashed name to the original name, but you might never need to do that. And you could dump out an auxiliary name-mapping table as part of the compilation process I guess (or re-generate the name from the C struct definition maybe, where it's available). Alternatively, if you still really don't like hash functions, you could probably define a reasonably compact bit-level encoding (then write that in the 37-character identifier alphabet), or even run some general purpose compression algorithm on that bit-level encoding. If you have enough felix code to analyse you could even pre-generate a fixed compression dictionary. That's stark raving bonkers of course: just use a hash.

Edit 2: Sorry, brain failure -- sha-1 digests are 160 bits, not 128.


PS. Not sure why this question was down-voted -- it seems reasonable to me, although some more context about this compiler you're working on might help.

OTHER TIPS

I don't really understand your problem.

template<typename T>
struct SListItem
{
    SListItem* m_prev;
    SListItem* m_next;
    T m_value;
};

int main()
{
    SListItem<int> sListItem;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top