Frage

I am trying to build a binary tree that will hold words from a file alphabetically as well as count the number of occurrences of the word in the file. Then later on I have to be able to replace words in the original text file. For now I'm just trying to setup my binary tree and getting the words in there. The string tokenizing works and it will print the words and punctuation every line. I will have to store the punctuation in a character array as well and count the occurrences of that. I'm having problems with my insert function but I'm not sure what I'm doing wrong. I am getting a segmentation fault.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*
Name: Marcus Lorenzana
*/

//binary tree struct to hold left and right node
//as well as the word and number of occurrences
typedef struct node
{
    char* word;
    int count;
    struct node *left;
    struct node *right;
}
node;

node *insert(node *item, char *word);
char* readFile(char* filename);

int main()
{
    FILE *fin;
    char *word;
    fin = fopen("data.txt", "r");


    char* filecontents = readFile("data.txt");

    //create dictionary node
    node *dictionary; 
    dictionary = NULL;

    //read words and punctuation in from the text file
    word = strtok (filecontents, " \n");
    int i = 0;
    while (word != NULL)
    {

        printf("%s\n",word);
        insert(dictionary,word);
        printf("%s",dictionary->word); 
        word = strtok (NULL, " \n");
        i++;
    }




    return 0;
}

//not sure if works
node *insert(node *item, char *word)
{
    if(item==NULL)
    {
        item= (node*) malloc(sizeof(node));
        strcpy(item->word, word);
        item->left=NULL;
        item->right=NULL;
        item->count++;
    }
    else
    {
        if(strcmp(word, item->word)<0)
        {
            item->left=insert(item->left, word); 
            item->count++;
        }
        else if(strcmp(word, item->word)>0)
        {
            item->right=insert(item->right, word);
            item->count++;
        }
        else
        {
            item->count++;
        }
    }
    return item;
}


char* readFile(char* filename)
{
    FILE* file = fopen(filename,"r");
    if(file == NULL)
    {
        return NULL;
    }

    fseek(file, 0, SEEK_END);
    long int size = ftell(file);
    rewind(file);

    char* content = calloc(size + 1, 1);

    fread(content,1,size,file);

    return content;
}
War es hilfreich?

Lösung

There are two problems with your insert function.

  1. It should be passed a double pointer to a struct node if you plan to mutate the pointer, otherwise it should be returning on each recursive call if you plan to use a single pointer to a struct node.
  2. You are not mallocing memory for the word when creating a new node.

To find problems in other parts of your code, use valgrind (here). It's a fantastic tool for debugging memory leaks or segmentation fault errors.


To solve problem 1, I will show the example of passing a single pointer and returning (still mutating). Your insert function (with problem 2 solved) should look as follows:

node *insert( node *item, char *word ) {
  if ( item == NULL ) {
    node *new_item = malloc( sizeof( struct node ) );

    new_item->word = malloc( sizeof( char ) * ( strlen( word ) + 1 ) ); // Note, this line (p2).

    strcpy( new_item->word, word );

    new_item->count = 1; // << Note change here.
    new_item->left = NULL;
    new_item->right = NULL;

    return new_item;
  } else {
    int cmp_result = strcmp( word, item->word );

    if ( cmp_result < 0 ) {
      item->left = insert( item->left, word );
      item->count++;
    } else if ( cmp_result > 0 ) {
      item->right = insert( item->right, word );
      item->count++;
    } else { 
      // Node already exists, do what you see fit here.
    }
  }

  return item;
}

To solve problem 2, the error lies within the block of code that creates a new node. Looking here:

item = ( node* )malloc( sizeof( node ) );
strcpy( item->word, word ); // << Here, invalid (error).

... you are not mallocing a block of memory for the word. What you are doing is overwriting memory inside your structure and possibly to other memory addresses you have not allocated (depending on when the garbage value will be 0 to simulate a NULL terminator). This is undefined behaviour.

The solution is to do the following:

item = ( node* ) malloc( sizeof( node ) );
item->word = malloc( sizeof( char ) * ( strlen( word ) + 1 ) ); // << Fix.
strcpy( item->word, word ); // << Now, valid.

... note the + 1 to ensure that there is room for the NULL terminator since strlen returns the string length of the array of characters passed to it.

Remark:

  • It's also not a good idea to cast the result of malloc, but it's entirely up to you as it is not causing an error (but could make an error message not appear when it should).
  • It's also important that your main function has a void parameter type, main( void ) not main( ) unless you intend to use the feature.
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top