Question

I am using two dynamic arrays to read from a file. They are to keep track of each word and the amount of times it appears. If it has already appeared, I must keep track in one array and not add it into the other array since it already exists. However, I am getting blank spaces in my array when I meet a duplicate. I think its because my pointer continues to advance, but really it shouldn't. I do not know how to combat this. The only way I have was to use a continue; when I print out the results if the array content = ""; if (*(words + i) == "") continue;. This basically ignores those blanks in the array. But I think that is messy. I just want to figure out how to move the pointer back in this method. words and frequency are my dynamic arrays.

I would like guidance in what my problem is, rather than solutions.

I have now changed my outer loop to be a while loop, and only increment when I have found the word. Thank you WhozCraig and poljpocket.

Now this occurs. enter image description here

Was it helpful?

Solution 2

First, to address your code, this is what it should probably look like. Note how we only increment i as we add words, and we only ever scan the words we've already added for duplicates. Note also how the first pass will skip the j-loop entirely and simply insert the first word with a frequency of 1.

void addWords(const std::string& fname, int count, string *words, int *frequency)
{
    std::ifstream file(fname);
    std::string hold;

    int i = 0;
    while (i < count && (file >> hold))
    {
        int j = 0;
        for (; j<i; ++j)
        {
            if (toLower(words[j]) == toLower(hold))
            {
                // found a duplicate at j
                ++frequency[j];
                break;
            }
        }

        if (j == i)
        {
            // didn't find a duplicate
            words[i] = hold;
            frequency[i] = 1;
            ++i;
        }
    }
}

Second, to really address your code, this is what it should actually look like:

#include <iostream>
#include <fstream>
#include <map>
#include <string>

//
// Your implementation of toLower() goes here.
//


typedef std::map<std::string, unsigned int> WordMap;

WordMap addWords(const std::string& fname)
{
    WordMap words;

    std::ifstream inf(fname);
    std::string word;

    while (inf >> word)
        ++words[toLower(word)];

    return words;
}

If it isn't obvious by now how a std::map<> makes this task easier, it never will be.

OTHER TIPS

Instead of incrementing your loop variable [i] every loop, you need to only increment it when a NEW word is found [i.e. not one already in the words array].

Also, you're wasting time in your inner loop by looping through your entire words array, since words will only exist up to index i.

 int idx = 0;
 while (file >> hold && idx < count) {
    if (!valid_word(hold)) {
        continue;
    }

    // You don't need to check past idx because you
    // only have <idx> words so far.
    for (int i = 0; i < idx; i++) {
        if (toLower(words[i]) == toLower(hold)) {
            frequency[i]++;
            isFound = true;
            break;
        }
    }

    if (!isFound) {
        words[idx] = hold;
        frequency[idx] = 1;
        idx++;
    }
    isFound = false;
 }

check out SEEK_CUR(). If you want to set the cursor back

The problem is a logical one, consider several situations:

  1. Your algorithm does not find the current word. It is inserted at position i of your arrays.
  2. Your algorithm does find the word. The frequency of the word is incremented along with i, which leaves you with blank entries in your arrays whenever there's a word which is already present.

To conclude, 1 works as expected but 2 doesn't.

My advice is that you don't rely on for loops to traverse the string but use a "get-next-until-end" approach which uses a while loop. With this, you can track your next insertion point and thus get rid of the blank entries.

int currentCount = 0;
while (file)
{
     // your inner for loop
     if (!found)
     {
         *(words + currentCount) = hold;
         *(frequency + currentCount) = 1;
         currentCount++;
     }
}

Why not use a std::map?

void collect( std::string name, std::map<std::string,int> & freq ){
  std::ifstream file;
  file.open(name.c_str(), std::ifstream::in );
  std::string word;
  while( true ){
    file >> word; // add toLower
    if( file.eof() ) break;
    freq[word]++;
  }
  file.close();
}

The problem with your solution is the use of count in the inner loop where you look for duplicates. You'll need another variable, say nocc, initially 0, used as limit in the inner loop and incremented whenever you add another word that hasn't been seen yet.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top