Question

I've got a very large text file that I'm trying to do word analysis on. Among word count, I might be looking for other information as well, but I left that out for simplicity. In this text file I have blocks of text separated by asterisks '*'. The code I have below scans the text file and prints out # of characters and words as it should, but I'd like to reset the counter after an asterisk is met, and store all information in a table of some sort. I'm not so worried on how I'll make the table as much as I am unsure of how to loop the same counting code for each text block between asterisks.

Maybe a for loop like

for (arr = strstr(arr, "*"); arr; arr = strstr(arr + strlen("*"), "*"))  

Example text file:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-
I have a sentence. I have two sentences now.
*
I have another sentence. And another.
*
I'd like to count the amount of words and characters from the asterisk above this 
one until the next asterkisk, not including the count from the last one.
*
...
    ...
    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    (EOF)

Desired output:

    *#      #words     #alphaChar
    ----------------------------
    1        9           34  
    -----------------------------
    2        5           30
    -----------------------------
    3       28           124
    ...
    ...


I have tried

        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>

        int main()
          {
          int characterCount=0;
          int counterPosition, wordCount=0, alphaCount=0;

          //input file
          FILE *file= fopen("test.txt", "r");
          if (file== NULL)
            printf("Cannot find the file.\n");


          //Count total number of characters in file
          while (1)
              {
              counterPosition = fgetc(speechFile);
              if (counterPosition == EOF)
                break;
              ++characterCount;
              }

          rewind(file); // Sends the pointer to the beginning of the file

          //Dynamically allocate since array size cant be variable
          char *arr= ( char*) malloc(totalCharacterCount);

          while(fscanf(speechFile, "%c", &arr[i]) != EOF ) //Scan until the end of file.
            i++;   //increment, storing each character in a unique position



              for(i = 0; i <characterCount; i++)
                  {
                  if(arr[i] == ' ') //count words
                    wordCount++;

                  if(isalpha(arr[i]))  //count letters only
                    alphaCount++;

                  }//end for loop

              printf("word count is %d and alpha count is %d", wordCount,alphaCount);
          }
Was it helpful?

Solution

Since you are having full files text in array arr[], you need to divide that string arr using * as delimiter. you can use strtok() to divide that string using * as delimiter. Then perform the word count and character count operation on each token. read this link to know about strtok.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top