Вопрос

I'm looking to write an efficient n-order Markov chain method to generate random text strings given a set of example text. I currently have a Java implementation that uses several layers of Maps, but it's clunky. A suffix array would be perfect for my needs, but I'm not clear if that can be efficiently implemented in Java.

In C I might do something like:

char exampleText[MAX];
char *suffixArray[MAX];
...
while(n<MAX && suffixArray[n++] = &exampleText[n]);
sort(suffixArray);

This gets gnarly in Java since I'd have to take substrings of exampleText, or turn suffixArray into an array of indices, or something else.

Any suggestions for a good approach to this in Java?

Это было полезно?

Решение

String will [typically] do that for you. (Typical implementations share backing arrays when created with substring, although that is subject to change at any time.)

Другие советы

For anyone interested in more efficient ways of constructing the suffix array in Java, I once used a library called jsuffixarrays. The code is here. It offers a range of construction algorithms to choose from and I found it to work very well. E.g. to use the SKEW algorithm you do this:

import org.jsuffixarrays.Algorithm;
import org.jsuffixarrays.ISuffixArrayBuilder;
import org.jsuffixarrays.SuffixArrays;
import org.jsuffixarrays.SuffixData;

String              exampleText = "....";
ISuffixArrayBuilder suxBuilder  = Algorithm.SKEW.getDecoratedInstance();
SuffixData          sux         = SuffixArrays.createWithLCP(text,sux_builder);

/* Then, to access the suffix array: */
sux.getSuffixArray();
/* And, to access the LCP array: */
sux.getLCP();

You can build without the LCP array if don't need that.

You can make some variants form array of suffixes:

First:

public static String[] suffixes(String s)
{
int N = s.length();
String[] suffixes = new String[N];
for (int i = 0; i < N; i++)
suffixes[i] = s.substring(i, N);
return suffixes;
}

Second:

public static String[] suffixes(String s)
{
int N = s.length();
StringBuilder sb = new StringBuilder(s);
String[] suffixes = new String[N];
for (int i = 0; i < N; i++)
suffixes[i] = sb.substring(i, N);
return suffixes;
}
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top