Question

So I have a programming exercise that involves concordance. I am attempting to take a .txt file, use regex to parse it into strings containing all words, then create a Hashtable that has the key (the word) and the value (the number of times the word appears in the document). We are supposed to be able to account for both case sensitive and non-case sensitive scenarios by passing in a boolean.

Here is what I have:

    public Concordance( String pathName, boolean is_case_sensitive ) {
    Scanner file = new Scanner(pathName);
    try {
        file = new Scanner(new File(pathName));
    } catch (FileNotFoundException e) {
        System.out.println("No File Found");
    }

    String[] words;

    Pattern pattern = Pattern.compile("\\W+");

    words = pattern.split(file.nextLine());
    String[] wordsCopy = new String[words.length];
    for (int i = 0; i < words.length; i++){
        wordsCopy[i] = new String(words[i]);
    }

    int counter = 0;

    while (file.hasNext()){
        for (String w : words){
            counter = 0;
            for (String w2 : wordsCopy){
                if (is_case_sensitive == false){
                    if (w.equalsIgnoreCase(w2)){
                        counter++;
                        //w2 = null;
                        tableOfWords.put(w, counter);
                        file.next();
                    }
                }
                if (is_case_sensitive == true){
                    if (w.equals(w2)){
                        counter++;
                        //w2 = null;
                        tableOfWords.put(w, counter);
                        file.next();
                    }
                }
            }
        }
    }
}

To walk you through where I am and where my error I believe is....

I use the scanner to "take in" the file the use the regex \W+ to get all of words. I create a String array, I split the Pattern pattern into the String array. Then I create a deep copy of the array to use during comparison. (So I now have two String arrays - words and wordsCopy). I use an int counter variable to keep track of how many times it appears and address case sensitivity by using an if statement and the equals/equalsIgnoreCase methods. I have been going back and forth between assigning w2 to null (its currently commented out) as I intuitively feel like if it is not set to null, it will be counted twice, but I can't seem to think through it appropriately. I think I am counting items in duplicate, but can't seem to figure out a solution. Any insight? Thanks!

Était-ce utile?

La solution

You dont need any extra String[] to check case sensitive

Pattern pattern = Pattern.compile("\\W+");
        HashMap<String, AtomicInteger> tableOfWords = new HashMap<String, AtomicInteger>();

        while (file.hasNextLine()){
            words = pattern.split(file.nextLine());
            for (String w : words){
                String tmp = w;
                if (!is_case_sensitive){
                    tmp = String.valueOf(w.toLowerCase());
                }

                AtomicInteger count = tableOfWords.get(tmp);
                if (count == null){
                    count = new AtomicInteger(0);
                }
                count.incrementAndGet();
                tableOfWords.put(tmp,count);
            }
        }

Convert the actual word into low / high case if case sensitive is not required. then everything work perfectly.

Autres conseils

As far as I can see you are actually counting words multiple times (more than twice aswell)

I'll give you a simple foreach loop to explain what you're doing, some of the syntax might be wrong as i'm not using an ide to write this code

int[5] ints = {1,2,3,4,5};
int[5] intcopy = ints;

for(int i:ints){
  for(int j: intcopy){
    system.out.println(j);
  }
}

What you will end out printing is 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

So instead of counting 5 things you are counting 25 things, hope this helps

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top