Question

I want to split every sentence from a document and store each sentence in different arrays. Each array element is the word of the sentences. But i cant get far from this.

int count =0,len=0;
String sentence[];
String words[][];
sentence = name.split("\\.");
count = sentence.length;

System.out.print("total sentence: " );
System.out.println(count);
int h;  
words = new String[count][]; 

for (h = 0; h < count; h++) {
     String tmp[] = sentence[h].split(" ");
     words[h] = tmp;
     len = len + words[h].length;
     System.out.println("total words: " );
     System.out.print(len); 

     temp = sentence[h].split(delimiter);  

     for(int i = 0; i < temp.length; i++) {
        System.out.print(len);
        System.out.println(temp[i]);
        len++;
     }  
}
Was it helpful?

Solution

I can't understand your code, but here's how to achieve your stated intention with just 3 lines:

String document; // read from somewhere

List<List<String>> words = new ArrayList<>();
for (String sentence : document.split("[.?!]\\s*"))
    words.add(Arrays.asList(sentence.split("[ ,;:]+")));

If you want to convert the Lists to arrays, use List.asArray(), but I wouldn't recommend it. Lists are far easier to deal with than arrays. For one, they expand automatically (one reason why the above code is so dense).

Addendum: (most) characters don't need escaping inside a character class.

OTHER TIPS

It seems like your input string is stored in main. I do not understand what the inner for loop is supposed to do: it prints len repeatedly, but does not update it!

String sentences[];
String words[][];

// End punctuation marks are ['.', '?', '!']
sentences = name.split("[\\.\\?\\!]"); 

System.out.println("num of sentences: " + sentences.length);

// Allocate stogage for (sentences.length) new arrays of strings
words = new String[sentences.length][];

// For each sentence
for (int h = 0; h < sentences.length; h++) {
  // Remove spaces from beginning and end of sentence (to avoid 0-length words)
  // split by any white space character sequence (caution if using Unicode!)
  words[h] = sentences[h].trim().split("\\s+"); 

  // Print out length of sentence.
  System.out.println("words (in sentence " + (h+1) + "): " + words[h].length);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top