Question

I have 100 text files. 50 of them are called text_H and the other are called text_T. What I would like to do is the following open two text files text_T_1 and text_H_1 and find the number of common words and write it to a text file then open text_H_2 and text_T_2 and find the number of common words....then open text_H_50 and text_T_50 and find the number of common words.

I have written the following code that open two text files and find common words and return the the number of common words between the the two files. The results are written in text file

For whatever reason instead of giving me the number of common word for just the open text files, it gave me the number of of common words for all files. For the example if the number of common words between fileA_1 and fileB_1 is 10 and the number of common words between fileA_2 and fileB_2 is 5, then result I get for number of common word for the second two files is 10+5=15. I'm hoping someone here can catch whatever it is that I'm missing, because I've been through this code many times now without success. Thanks ahead of time for any help!

The code:

package xml_test;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;

public class app {

    private static ArrayList<String> load(String f1) throws FileNotFoundException 
    {
        Scanner reader = new Scanner(new File(f1));
        ArrayList<String> out = new ArrayList<String>();
        while (reader.hasNext())
        {
            String temp = reader.nextLine();
            String[] sts = temp.split(" ");
            for (int i = 0;i<sts.length;i++)
            {
                if(sts[i] != "" && sts[i] != " " && sts[i] != "\n")
                    out.add(sts[i]);
            }
        }
        return out;
    }

    private static void write(ArrayList<String> out, String fname) throws IOException
    {
        FileWriter writer = new FileWriter(new File(fname));
        //int count=0;
        int temp1=0;
        for (int ss= 1;ss<=3;ss++)
        {
            int count=0;
            for (int i = 0;i<out.size();i++)
            {
                //writer.write(out.get(i) + "\n");
                //writer.write(new Integer(count).toString());
                count++;
            }
            writer.write("count ="+new Integer(temp1).toString()+"\n");
        }
        writer.close();
    }

    public static void main(String[] args) throws IOException 
    {
        ArrayList<String> file1;
        ArrayList<String> file2;
        ArrayList<String> out = new ArrayList<String>();
        //add for loop to loop through all T's and H's 
        for(int kk = 1;kk<=3;kk++)
        {
            int count=0;
            file1 = load("Training_H_"+kk+".txt");
            file2 = load("Training_T_"+kk+".txt");
            //int count=1;

            for(int i = 0;i<file1.size();i++)
            {
                String word1 = file1.get(i);
                count=0;
                //System.out.println(word1);
                for (int z = 0; z <file2.size(); z++)
                {
                    //if (file1.get(i).equalsIgnoreCase(file2.get(i)))
                    if (word1.equalsIgnoreCase(file2.get(z)))
                    {
                        boolean already = false;
                        for (int q = 0;q<out.size();q++)
                        {
                            if (out.get(q).equalsIgnoreCase(file1.get(i)))
                            {
                                count++;
                                //System.out.println("count is "+count);
                                already = true;
                            }
                        }
                        if (already==false)
                        {
                            out.add(file1.get(i));
                        }
                    }
                }
                //write(out,"output_"+kk+".txt");
            }
            //count=new Integer(count).toString();
            //write(out,"output_"+kk+".txt");
            //write(new Integer(count).toString(),"output_2.txt");
            //System.out.println("count is "+count);
        }//
    }
}
Was it helpful?

Solution

Let me show you what your code is doing and see if you can spot the problem.

List wordsInFile1 = getWordsFromFile();
List wordsInFile2 = getWordsFromFile();

List foundWords = empty;

//Does below for each compared file
for each word in file 1
    set count to 0
    compare to each word in file 2
        if the word matches see if it's also in foundWords
            if it is in foundWords, add 1 to count
        otherwise, add the word to foundWords

//Write the number of words
prints out the number of words in foundWords

Hint: The issue is with foundWords and where you are adding to count. arunmoezhi's comment is on the right track, as well as board_reader's point #3 in his answer.

As it stands now, your code is doing nothing meaningful with any of the count variables

OTHER TIPS

  1. use more meaningful variable names in loops, makes code readable.
  2. use HashMap-s instead of ArrayList-s, will make code smaller, faster and a lot easier. will use less memory too in case words are repeated several times in files.
  3. should not you increase count in already==false case?
  4. could not figure out point of calculating count 3 times in write method, is not count equal to out.size()?
  5. probably there are more too...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top