Question

When running the algorithm on the string 'AEKEAAEKEAAEKEA$' looking for the longest substring with at least 3 occurences all the nodes in the suffix tree have maximum 2 branches, how can that be?

The correct result should be the substring 'AEKEA'.

You can easily see the tree in the online suffix tree builder

I just followed the Wikipedi description:

"The problem of finding the longest substring with at least k occurrences can be found by first preprocessing the tree to count the number of leaf descendants for each internal node, and then finding the deepest node with at least k descendants"

What am I missing here?

Thank you.

Was it helpful?

Solution

I don't think that website is correct. When I run 'AEKEAAEKEAAEKEA' through my suffix tree, I get the following tree.

└── (0)
    ├── (27) $
    ├── (6) A
    │   ├── (26) $
    │   ├── (16) AEKEA
    │   │   ├── (17) $
    │   │   └── (7) AEKEA$
    │   └── (18) EKEA
    │       ├── (19) $
    │       └── (8) AEKEA
    │           ├── (9) $
    │           └── (1) AEKEA$
    ├── (4) E
    │   ├── (24) A
    │   │   ├── (25) $
    │   │   └── (14) AEKEA
    │   │       ├── (15) $
    │   │       └── (5) AEKEA$
    │   └── (20) KEA
    │       ├── (21) $
    │       └── (10) AEKEA
    │           ├── (11) $
    │           └── (2) AEKEA$
    └── (22) KEA
        ├── (23) $
        └── (12) AEKEA
            ├── (13) $
            └── (3) AEKEA$

As you can see from this branch, you've found your longest substring with 3 occurences.

└── (0)
    ├── (27) $
    ├── (6) A
    │   ├── (26) $
    │   ├── (16) AEKEA
    │   │   ├── (17) $
    │   │   └── (7) AEKEA$
    │   └── (18) EKEA
    │       ├── (19) $
    │       └── (8) AEKEA
    │           ├── (9) $
    │           └── (1) AEKEA$
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top