An ArrayList
is not the best data structure for purpose of your outer list, and at least part of your difficulty stems from incorrect use of a list of lists.
In your implementation, presumably follows
is an ArrayList of LinkedLists declared like this:
ArrayList<LinkedList<String>> follows = new ArrayList<>();
The result of follows.contains(firstWord)
will never be true, because follows
contains elements of type LinkedList, not String. firstWord
is a String, and so would not be an element of follows
, but would be the first element of an ArrayList which is an element of follows
.
The solution offered below uses a Map
, or more specifically a HashMap
, for the outer list follows
. A Map is preferable because when searching for the first word, the amortized look-up time will be O(1) using a map versus O(n) for a list.
String firstWord = dataFile.next().toLowerCase();
Map<String, List<String>> follows = new HashMap<>();
int nWords = 0;
while (dataFile.hasNext())
{
String secondWord = dataFile.next().toLowerCase();
nWords++;
if (nWords % 1000 == 0)
{
System.out.println(nWords + " words");
}
//and put words into list if not already there
//check if this word is already in the list
if (follows.containsKey(firstWord))
{
//add the next word to it's linked list
List list = follows.get(firstWord);
if (!list.contains(secondWord))
{
list.add(secondWord);
}
}
else
{
//create new linked list for this word and then add next word
List list = new LinkedList<String>();
list.add(secondWord);
follows.put(firstWord, list);
}
//go on to next word
firstWord = secondWord;
}
The map will look like this:
the: [cat, red]
cat: [walks]
to: [the]
red: [tree]
walks: [to]
I also made the following changes to your implementation:
Don't add duplicates to the list of following words. Note that a
Set
would be a more appropriate data structure for this task, but you clearly state that a requirement is to useLinkedList
.Use
String.toLowerCase()
to move all strings to lower case, so that "the" and "The" are treated equivalently. (Be sure you apply this to the initial value offirstWord
as well, which doesn't appear in the code you provided.)
Note that both this solution and your original attempt assume that punctuation has already been removed.