Question

This is more a gotcha I wanted to share than a question: when printing with toString(), Java will detect direct cycles in a Collection (where the Collection refers to itself), but not indirect cycles (where a Collection refers to another Collection which refers to the first one - or with more steps).

import java.util.*;
public class ShonkyCycle {
  static public void main(String[] args) {
    List a = new LinkedList();
    a.add(a);                      // direct cycle
    System.out.println(a);         // works:  [(this Collection)]

    List b = new LinkedList();
    a.add(b);
    b.add(a);                      // indirect cycle
    System.out.println(a);         // shonky:  causes infinite loop!
  }
}

This was a real gotcha for me, because it occurred in debugging code to print out the Collection (I was surprised when it caught a direct cycle, so I assumed incorrectly that they had implemented the check in general). There is a question: why?

The explanation I can think of is that it is very inexpensive to check for a collection that refers to itself, as you only need to store the collection (which you have already), but for longer cycles, you need to store all the collections you encounter, starting from the root. Additionally, you might not be able to tell for sure what the root is, and so you'd have to store every collection in the system - which you do anyway - but you'd also have to do a hash lookup on every collection element. It's very expensive for the relatively rare case of cycles (in most programming). (I think) the only reason it checks for direct cycles is because it so cheap (one reference comparison).

OK... I've kinda answered my own question - but have I missed anything important? Anyone want to add anything?


Clarification: I now realize the problem I saw is specific to printing a Collection (i.e. the toString() method). There's no problem with cycles per se (I use them myself and need to have them); the problem is that Java can't print them. Edit Andrzej Doyle points out it's not just collections, but any object whose toString is called.

Given that it's constrained to this method, here's an algorithm to check for it:

  • the root is the object that the first toString() is invoked on (to determine this, you need to maintain state on whether a toString is currently in progress or not; so this is inconvenient).
    • as you traverse each object, you add it to an IdentityHashMap, along with a unique identifier (e.g. an incremented index).
    • but if this object is already in the Map, write out its identifier instead.

This approach also correctly renders multirefs (a node that is referred to more than once).

The memory cost is the IdentityHashMap (one reference and index per object); the complexity cost is a hash lookup for every node in the directed graph (i.e. each object that is printed).

Was it helpful?

Solution

I think fundamentally it's because while the language tries to stop you from shooting yourself in the foot, it shouldn't really do so in a way that's expensive. So while it's almost free to compare object pointers (e.g. does obj == this) anything beyond that involves invoking methods on the object you're passing in.

And at this point the library code doesn't know anything about the objects you're passing in. For one, the generics implementation doesn't know if they're instances of Collection (or Iterable) themselves, and while it could find this out via instanceof, who's to say whether it's a "collection-like" object that isn't actually a collection, but still contains a deferred circular reference? Secondly, even if it is a collection there's no telling what it's actual implementation and thus behaviour is like. Theoretically one could have a collection containing all the Longs which is going to be used lazily; but since the library doesn't know this it would be hideously expensive to iterate over every entry. Or in fact one could even design a collection with an Iterator that never terminated (though this would be difficult to use in practice because so many constructs/library classes assume that hasNext will eventually return false).

So it basically comes down to an unknown, possibly infinite cost in order to stop you from doing something that might not actually be an issue anyway.

OTHER TIPS

I'd just like to point out that this statement:

when printing with toString(), Java will detect direct cycles in a collection

is misleading.

Java (the JVM, the language itself, etc) is not detecting the self-reference. Rather this is a property of the toString() method/override of java.util.AbstractCollection.

If you were to create your own Collection implementation, the language/platform wouldn't automatically safe you from a self-reference like this - unless you extend AbstractCollection, you would have to make sure you cover this logic yourself.

I might be splitting hairs here but I think this is an important distinction to make. Just because one of the foundation classes in the JDK does something doesn't mean that "Java" as an overall umbrella does it.

Here is the relevant source code in AbstractCollection.toString(), with the key line commented:

public String toString() {
    Iterator<E> i = iterator();
    if (! i.hasNext())
        return "[]";

    StringBuilder sb = new StringBuilder();
    sb.append('[');
    for (;;) {
        E e = i.next();
        // self-reference check:
        sb.append(e == this ? "(this Collection)" : e); 
        if (! i.hasNext())
            return sb.append(']').toString();
        sb.append(", ");
    }
}

The problem with the algorithm that you propose is that you need to pass the IdentityHashMap to all Collections involved. This is not possible using the published Collection APIs. The Collection interface does not define a toString(IdentityHashMap) method.

I imagine that whoever at Sun put the self reference check into the AbstractCollection.toString() method thought of all of this, and (in conjunction with his colleagues) decided that a "total solution" is over the top. I think that the current design / implementation is correct.

It is not a requirement that Object.toString implementations be bomb-proof.

You are right, you already answered your own question. Checking for longer cycles (especially really long ones like period length 1000) would be too much overhead and is not needed in most cases. If someone wants it, he has to check it himself.

The direct cycle case, however, is easy to check and will occur more often, so it's done by Java.

You can't really detect indirect cycles; it's a typical example of the halting problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top