Question

I want to represent threaded comments in Java. This would look similar to the way comments are threaded on reddit.com

hello
   hello
      hello
      hello
   hello
   hello
      hello

As in the example above, responses are nested in the HTML with appropriate indentation to reflect their relationship to prior comments.

What would be an efficient way to represent this in Java?

I'm thinking some kind of tree data structure would be appropriate.

But is there one in particular which would be most efficient to minimize tree traversals?

This would be important if I have voting on each comment. Because then the tree would need to be reordered after each vote - a potentially expensive operation computationally.

By the way, if anyone knows of an open source existing implementation of this in Java, that would help too.

Was it helpful?

Solution

I would use levels of linked lists.

message1
    message2
        message3
        message4
    message5
    message6
        message7

Each node would have a pointer to its:

- forward sibling  (2->5, 3->4, 5->6,                   1/4/6/7->NULL).
- backward sibling (4->3, 5->2, 6->5,                   1/2/3/7->NULL).
- first child      (1->2, 2->3, 6->7,                   3/4/5/7->NULL).
- parent           (2->1, 3->2, 4->2, 5->1, 6->1, 7->6,       1->NULL).

Within each level, messages would be sorted in the list by vote count (or whatever other score you wanted to use).

That would give you maximum flexibility for moving things around and you could move whole sub-trees (e.g., message2) just by changing the links at the parent and that level.

For example, say message6 gets a influx of votes that makes it more popular than message5. The changes are (adjusting both the next and previous sibling pointers):

  • message2 -> message6
  • message6 -> message5
  • message5 -> NULL.

to get:

message1
    message2
        message3
        message4
    message6
        message7
    message5

If it continues until it garners more votes than message2, the following occurs:

  • message6 -> message2
  • message2 -> message5

AND the first-child pointer of message1 is set to message6 (it was message2), still relatively easy, to get:

message1
    message6
        message7
    message2
        message3
        message4
    message5

Re-ordering only needs to occur when a score change results in a message becoming more than its upper sibling or less than its lower sibling. You don't need to re-order after every score change.

OTHER TIPS

The tree is right (with getLastSibling and getNextSibling), but if you're storing/querying the data, you probably want to store a lineage for each entry, or number by a preorder traversal:

http://www.sitepoint.com/article/hierarchical-data-database/2/

For loss of the exact number of subnodes, you can leave gaps to minimise renumbering. Still, I'm not certain that this will be noticeably faster than traversing the tree each time. I guess it depends how deep your tree grows.

See also:

SQL - How to store and navigate hierarchies? http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html (this scheme is also call a Celko tree)

This would be important if I have voting on each comment. Because then the tree would need to be reordered after each vote - a potentially expensive operation computationally.

Sounds like a premature optimization to me, possibly even a faulty optimization.

Your tree data structure sounds logical for representing your data. I say stick with it. Optimize it later only if a performance problem is detected and measured, and can be compared with alternatives.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top