Question

I am wondering whether there exists any declarative language for arbitrarily describing the format and semantics of a data structure, that can be compiled to a specific implementation of that structure in any of a set of target languages. That is, something like a generic data definition language but geared toward describing arbitrary data structures such as vectors, lists, trees, etc., and the semantics of operations on those structures. I ask because I had an idea for a feasible implementation of this concept, and I'm just wondering whether it's worth it, and, consequently, whether it's been done before.

Another, slightly more abstract question: is there any real difference between the normative specification of a data structure (what it does) and its implementation (how it does it)? More specifically, should separate implementations of the same requirements be considered different structures?

Was it helpful?

Solution 4

Cozy is “a tool that synthesizes data structure implementations from very high-level specifications” and seems to be essentially the language I was actually looking for (or considering writing) when I asked this question.

It can automatically generate an implementation (in Java or C++, at the time of this writing) from a data structure specification written in its proprietary language. A specification describes the abstract state, update operations, and query operations of a data structure, as well as invariants that must be maintained and assumptions that can be used by the solver to optimise the implementation. For example, here is a partial specification for a graph data structure:

Graph:
    handletype Node = { id : Int }
    handletype Edge = { src : Int, dst : Int }

    state nodes : Bag<Node>
    state edges : Bag<Edge>

    // Invariant: disallow self-edges.
    invariant (sum [ 1 | e <- edges, e.val.src == e.val.dst ]) == 0;

    op addNode(n : Node)
        nodes.add(n);

    op addEdge(e : Edge)
        assume e.val.src != e.val.dst;
        edges.add(e);

    query out_degree(nodeId : Int)
        sum [ 1 | e <- edges, e.val.src == nodeId ]

    // …

Its implementation is described in Fast Synthesis of Fast Collections and Generalized Data Structure Synthesis by Calvin Loncaric, Emina Torlak, and Michael D. Ernst.

OTHER TIPS

If you felt like it, a combination of XML with XSLT could describe a data structure, and provide a matching definition in essentially any language if your choice. I've never tried to prove it formally, but my first guess would be that S-expressions are a superset of XML (modulo syntactical differences).

At least in theory, yes there are (or at least can be) differences between a description of what a data structure does, and how it does it. For an obvious example, you could describe a generic mapping from keys to values, which could use an implementation based on hash tables, skip lists, binary search trees, etc. It's mostly a question of describing it at a high enough level of abstraction to allow differences in the implementation. If you include too many requirements (complexity, ordering, etc.) you can pretty quickly rule out many implementations.

You may be interested in messaging specification / data serialization languages such as Google's Protocol Buffers as well as ASN.1. It's a slightly different slant than you're looking for, but in the same vein.

Both are ways of declaring generic messages for communications. Protocol buffers message specs "compile" to different languages, but the central protocol is consistent. ASN.1 has multiple different compilation utilities as well as different protocol representations staying logically consistent with varying literal implementations. Look at XER, PER vs. BER for example.

I'd love a specification language that would just focus on simple packed binary layout against a logical memory structure. It may be that plain C structs are the simplest common way of expressing this. I had hoped ASN.1 had some way of getting to that, but after looking at it for a bit, ASN.1 PER is close, but not quite it.

Edit: Apache Thrift and Capn' Proto may also be interesting.

There are approaches to that sort of thing in dynamic logics, which attempt to capture the semantics of programs. However, the meaning in terms of the dynamic logic is in terms of preconditions and postconditions and is agnostic with regard to the actual implementation of the list.

These data structures are inherently tied to implementation, as the only difference between a linked list and array is specifically how it is laid out in memory.

For this, there is a generic data definition language --- any high level programming language -- C, C++, java -- that specifies this. Any of them is as generic as the other, since within this context any of them could be compiled to the other.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top