Lua's hybrid array and hash table; does it exist anywhere else?

https://stackoverflow.com/questions/2125654

22-09-2019
|

Question

Lua's implementation of tables keep its elements in two parts: an array part and a hash part.

Does such a thing exist in any other languages?

Take a look at section 4, Tables, in The Implementation of Lua 5.0.

Solution

This idea was original with Roberto Ierusalimschy and the rest of the Lua team. I heard Roberto give a talk about it at the MIT Lightweight Languages workshop in 2003, and in this talk he discussed prior work and argued convincingly that the idea was new. I don't know if other languages have copied it since.

The original Awk has a somewhat more restricted language model than Lua; either a number or a string can be used as a key in an array, but arrays themselves are not first-class values: an array must have a name, and an array cannot be used as a key in the array.

Regarding the implementation, I have checked the sources for the original Awk as maintained by Brian Kernighan, and the implementation of Awk uses a hash table, not Lua's hybrid array/table structure. The distinction is important because in Lua, when a table is used with consecutive integer keys, the space overhead is the same as for a C array. This is not true for original Awk.

I have not bothered to investigate all later implementations of awk, e.g., Gnu Awk, mawk, and so on.

OTHER TIPS

EDIT: This doesn't answer the question, which was about the implementation.

AWK also did it.

It's interesing how some languages conflate operations that are different in others:

List indexing - a[10]
Associative indexing - a['foo']
Object field access - a.foo
Function/Method calls - a('foo') / a.foo()

Very incomplete examples:

Perl is the rare language where sequential/associative indexing have separate syntax - a[10] / a{'foo'}. AFAIK, object fields map to one of the other operations, depending on which the implementor of the class felt like using.
In Python, all 4 are distinct; sequential/associative indexing use same syntax but separate data types are optimized for them.
In Ruby, object fields are methods with no arguments - a.foo.
In JavaScript, object fields a.foo are syntax sugar for associative indexing a['foo'].
In Lua and AWK, associative arrays are also used for sequential indexing - a[10].
In Arc, sequential and associative indexing looks like function calls - (a 10) / (a "foo"), and I think a.foo is syntax sugar for this too(?).

The closest thing I can think of is Javascript - you create an array with new Array(), and then proceed to index either by number or by string value. It could well be for performance reasons some Javascript implementations choose to do so using two arrays, for the reasons noted in the Lua documentation you linked to.

ArrayWithHash is a fast implementation of array-hashtable hybrid in C++.

Since C++ is a statically typed language, only integer keys are allowed in ArrayWithHash (no way to insert string or pointer key). In other words, it is something like an array with hash table backup for large indices. Also it uses different hash table implementation which is less memory-efficient than Lua table implementation.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow