Question

I'm new to rust (using 0.10) and exploring its use by implementing something like the rustc::middle::graph::Graph struct, but using strings as node indices and storing nodes in a HashMap.

Assuming non-static keys, what's a reasonable and efficient policy for ownership of the strings? Does the HashMap need to own its keys? Does each NodeIndex need to own its str? Is it possible for the node to own the string that defines its index and have everything else borrow that string? More generally, how should one share an immutable (but non-static) string amongst several data structures? If the answer is "it depends", what are the relevant issues?

If it is possible to have ownership of the string in one place and borrow it elsewhere, how is that accomplished? For example, if the Node struct were modified to store the node index as a string, how would the HashMap and NodeIndex use a borrowed version of it?

Was it helpful?

Solution

Is it possible for the node to own the string that defines its index and have everything else borrow that string?

[...]

If it is possible to have ownership of the string in one place and borrow it elsewhere, how is that accomplished? For example, if the Node struct were modified to store the node index as a string, how would the HashMap and NodeIndex use a borrowed version of it?

Not really: it's impossible (for the compiler) to verify that self-references don't get invalidated, i.e. many borrowing-internally situations (including this one specifically) end up allowing code similar to

struct Foo<'a> {
    things: Vec<~str>,
    borrows: Vec<&'a str>
}
let mut foo = Foo { ... };
foo.things.push(~"x");
foo.things.push(~"y");
// foo.things is [~"x", ~"y"]

// try to borrow the ~"y" to put a "y" into borrows
foo.borrows.push(foo.things.get(1).as_slice());

// ... time/SLoC passes ...

// we make the borrowed slice point to freed memory
// by popping/deallocating the ~"y"
foo.things.pop();

The compiler has a very hard time tell that an arbitrary modification won't be like the .pop call and invalidate the internal pointers. Basically, putting a self-reference to data that an object owns into the object itself would have to freeze that object, so no further modifications could occur (including moving out of the struct, e.g. to return it).

Tl;dr: you can't have one part of the Graph storing ~strs and another part storing &str slices into those ~strs.

That said, if you were to use some unsafe code, and only allow the Graph to be expanded, i.e. never removing nodes (i.e. never letting a ~str be deallocated until the whole graph is being destroyed), then some version of this could actually work.

More generally, how should one share an immutable (but non-static) string amongst several data structures?

You can use a Rc<~str>, with each data structure storing its own pointer. Something like

struct Graph<T> {
    nodes: HashMap<Rc<~str>, Node<T>>
}
struct Node<T> {
    name: Rc<~str>,
    data: T,
    outgoing_edges: Vec<Rc<~str>>
}

Another approach would be using a bidirectional map connecting strings with an "interned" index of a simple type, something like

struct Graph<T> {
    index: BiMap<~str, Index>,
    nodes: HashMap<Index, Node<T>>
}
#[deriving(Hash, Eq, TotalEq)]
struct Index { x: uint }

struct Node<T> {
    name: Index,
    data: T,
    outgoing_edges: Vec<Index>    
}

(Unfortunately this is hypothetical: Rust's stdlib doesn't have a BiMap type like this (yet...).)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top