Question

I've written an n-ary tree ADT which works fine. However, I need to store its serialization in a variable a calling class. eg.

    DomTree<String> a = Data.createTreeInstance("very_large_file.xml");
    String x = a.toString();

I've written method which serves the purpose exactly how I need it, but on very large inputs it takes forever (20mins on a 100MB xml file) - I have timed the methods and building the tree from the xml file is quick, but calling toString() as shown above is very slow.

@Override
public String toString(){
    return printTree(this);
}

public String printTree(AbstractTree<E> tree){
    if (tree.isLeaf()){
        return tree.getNodeName();
    }else{
        String tStr = tree.getNodeName() + "(";

        int i = 0;
        Iterator<AbstractTree<E>> child = tree.getChildren().iterator();
        while (i < tree.getChildren().size() - 1){

            tStr += printTree(child.next()) + ", ";
            i++;
        }
        tStr += printTree(child.next()) + ")";

        return tStr;    
    }
}

I'm guessing it is to do with the way the string is built up rather than how the tree is traversed? Is there a better way to do this?

UPDATE: Following the example of Skaffman, the following code give outOfMemoryError for very large input.

@Override
public String toString(){
    StringBuilder buffer = new StringBuilder();
    printTree(this, buffer);
    return buffer.toString();

}

public String printTree(AbstractTree<E> tree, StringBuilder buffer){
    if (tree.isLeaf()){
        return tree.getNodeName();
    }else{
        buffer.append(tree.getNodeName());
        buffer.append("(");

        int i = 0;
        Iterator<AbstractTree<E>> child = tree.getChildren().iterator();
        while (i < tree.getChildren().size() - 1){

            buffer.append(printTree(child.next(), buffer));
            buffer.append(", ");
            i++;
        }
        buffer.append(printTree(child.next(), buffer)); 
        buffer.append(")");

        return buffer.toString();   
    }
}

UPDATE: Works perfectly now, using Skaffmans example

Was it helpful?

Solution

String concats like that are punishingly slow. Use a StringBuilder.

@Override
public String toString(){
        StringBuilder buffer = new StringBuilder();
        printTree(this, buffer);
        return buffer.toString();
}

public void printTree(AbstractTree<E> tree, StringBuilder buffer){
    if (tree.isLeaf()){
        buffer.append(tree.getNodeName());
    } else {
        buffer.append(tree.getNodeName());
        buffer.append("(");

        int i = 0;
        Iterator<AbstractTree<E>> child = tree.getChildren().iterator();
        while (i < tree.getChildren().size() - 1){
            printTree(child.next(), buffer);
            buffer.append(", ");
            i++;
        }
        printTree(child.next(), buffer); 
        buffer.append(")");
    }
}

OTHER TIPS

Don't use string concatenation in loops. It does not scale.

Use StringBuilder, this does not make new objects all the time, like string concatenation..

void print() {
StringBuilder sb = new StringBuilder();
sb.append("hello");
sb.append(" World!");
System.out.println(sb.toString());

}

Look at StringBuilder, don't use simple concatenation, and pass the StringBuilder through your entire process (or make it a global).

Let me say the reason that string concatenation is slow is because strings are immutable. This means every time you write "+=", a new String is created. This means the way you build up your string is in the worst case, O(n2). That's because if you +='ed 1 char at a time, the cost of building a new string would be 2 + 3 + 4 + ... + n, which is O(n2).

Use StringBuilder as other's suggest (over the slower, but threadsafe StringBuffer).

I suppose I should add, StringBuilder will give you O(n) amortized time, because it works like a vector behind the scenes, since it is mutable. So build up your string there, and then call toString().

StringBuilder builder = new StringBuilder();
builder.append("blah"); // append more as needed.
String text = builder.toString();

I would also like to add that this problem is similar in Python. The idiom in python is to append all your strings to concatenate into a list, and then join the list. "".join(the_list).

UPDATE: As Bill points out, concatenation is not the root of all evil. One off string concatenations are fine, and may even be optimized! (They are also worst case linear). But, when you are concatenating in a loop, as you are above, the performance will drastically change as the number of iterations goes up. In that case, my above analysis is flawless, as I specifically stated it is "worst case", which means you assume no optimizations. (Which the JVM can't even optimize the concatenation in loops as well as it can outside).

If a profiler confirms you that the bottleneck is string concatenation you have two choices:

  • StringBuilder/StringBuffer (the latter is better suited for threading)
  • Ropes for Java:

A rope is a high performance replacement for Strings. The datastructure, described in detail in "Ropes: an Alternative to Strings", provides asymptotically better performance than both String and StringBuffer for common string modifications like prepend, append, delete, and insert. Like Strings, ropes are immutable and therefore well-suited for use in multi-threaded programming.

You might want to look at String.intern() as a way to cut down on memory use. This will use the interned String from the string pool. If you have many duplicated strings, it might be faster. More info on interned strings here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top