Frage

I'm playing around with C strings and streams to get a better understanding of them. I have this test program to read a fixed size block of data from an input file to a buffer, store the buffer contents in an intermediate storage (in this case, I want the storage to be able to store three different "reads") and then write the read string and one of the strings in intermediate storage to an output file.

A note on this: In each iteration I just use the two first positions of the intermediate storage and just write the second "stored string" to the file.

THE CODE:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 3
#define BUFFER_SIZE 5

int main(int argc, char** argv) {
  FILE* local_stream_test = fopen("LOCAL_INPUT_FILE","r");
  FILE* local_output_test = fopen("LOCAL_OUTPUT_TEST","w");

  if(!local_stream_test) {
    puts("!INPUT FILE");
    return EXIT_FAILURE;
  }
  if(!local_output_test) {
    puts("!OUTPUT FILE");
    return EXIT_FAILURE;
  }
  char my_buffer[BUFFER_SIZE];
  char test[SIZE];
  char* test2[SIZE];
  memset(my_buffer,0,sizeof(my_buffer));
  memset(test,0,sizeof(test));
  memset(test2,0,sizeof(test2));

  int read = fread( my_buffer, sizeof(my_buffer[0]), sizeof(my_buffer)/sizeof(my_buffer[0]), local_stream_test );

   printf("FIRST READ TEST: %d\n",read);
   printf("\tMY_BUFFER, SIZEOF: %lu, STRLEN: %lu\n",sizeof(my_buffer),strlen(my_buffer));

   fwrite(my_buffer,sizeof(my_buffer[0]),/*strlen(aux)*/ read,local_output_test);
   char* aux_test = strdup(my_buffer);
   printf("\tAUX_TEST STRLEN: %lu, ## %s\n",strlen(aux_test), aux_test);
   free(aux_test);
   aux_test = NULL;

   while(read > 0) {
     if(feof(local_stream)) {
       puts("BYE");
       break;
     }
     read = fread( my_buffer, sizeof(my_buffer[0]), sizeof(my_buffer)/sizeof(my_buffer[0]), local_stream_test );
     aux_test = strdup(my_buffer);

     if(!aux_test) {
       puts("!AUX_TEST");
       break;
     }


     printf("READ TEST: %d\n",read);
     printf("\tMY_BUFFER, SIZEOF: %lu, STRLEN: %lu\n",sizeof(my_buffer),strlen(my_buffer));
     printf("\tAUX_TEST, SIZEOF: %lu, STRLEN: %lu ** SIZEOF *AUX_TEST: %lu, SIZEOF AUX_TEST[0]: %lu\n",sizeof(aux_test),strlen(aux_test),sizeof(*aux_test),sizeof(aux_test[0]));

     fwrite(aux_test,sizeof(aux[0]),/*strlen(aux)*/ read,local_output_test);

     printf("** AUX_TEST: %s\n",aux_test);
     test2[0] = aux_test;
     test2[1] = aux_test;
     test2[1][3] = toupper(test2[1][3]);

     fwrite(test2[1],sizeof(test2[1][0]),read,local_output_test);

     printf("\n** TEST2[0] SIZEOF: %lu, STRLEN: %lu, TEST2[0]: %s\n",sizeof(test2[0]),strlen(test2[0]),test2[0]);
     printf("\n** TEST2[1] SIZEOF: %lu, STRLEN: %lu, TEST2[1]: %s\n",sizeof(test2[1]),strlen(test2[1]),test2[1]);

     strcpy(test2[1],aux_test);
     printf("** COPIED TEST2[1]: %s\n",test2[1]);
     free(aux_test);
     aux_test = NULL;
     puts("*******************************************");
  }
  return EXIT_SUCCESS;
}

THE INPUT FILE:

converts a byte string to a floating point value
converts a byte string to an integer value
converts a byte string to an integer value

When printing the strings I get extra junk values at the end of it after the second read. Here's the output in stdout for the first, second and third read's from the file:

FIRST READ TEST: 5
    MY_BUFFER, SIZEOF: 5, STRLEN: 5
    AUX_TEST STRLEN: 5, ## conve
READ TEST: 5
    MY_BUFFER, SIZEOF: 5, STRLEN: 5
    AUX_TEST, SIZEOF: 4, STRLEN: 5 ** SIZEOF *AUX_TEST: 1, SIZEOF AUX_TEST[0]: 1

** AUX_TEST: rts a

** TEST2[0] SIZEOF: 4, STRLEN: 5, TEST2[0]: rts a

** TEST2[1] SIZEOF: 4, STRLEN: 5, TEST2[1]: rts a
** COPIED TEST2[1]: rts a

*******************************************
READ TEST: 5
    MY_BUFFER, SIZEOF: 5, STRLEN: 13
    AUX_TEST, SIZEOF: 4, STRLEN: 13 ** SIZEOF *AUX_TEST: 1, SIZEOF AUX_TEST[0]: 1

** AUX_TEST:  byte▒▒▒▒

** TEST2[0] SIZEOF: 4, STRLEN: 13, TEST2[0]:  byTe▒▒▒▒


** TEST2[1] SIZEOF: 4, STRLEN: 13, TEST2[1]:  byTe▒▒▒▒

** COPIED TEST2[1]:  byTe▒▒▒▒

What troubles me is the fact that when the junk values start to appear, the length of the string is greater than the read bytes from the file: 13 versus 5. I have played around with the BUFFER_SIZE but I always get the junk values when printing to stdout unless the size is big enough to read the file in one go.

For example, with BUFFER_SIZE equals to 500, this is the output in stdout:

FIRST READ TEST: 135
    MY_BUFFER, SIZEOF: 300, STRLEN: 135
    AUX_TEST STRLEN: 135, ## converts a byte string to a floating point value
       converts a byte string to an integer value
        converts a byte string to an integer value

 BYE

And the output files generated:

BUFFER_SIZE = 5

converts arts a byte byTe stri stRing tong To a fl a FloatinoatIng poig pOint vant Value
clue
converonvErts a ts A byte bytE strinstrIng to g tO an inan IntegertegEr valu vaLue
cone
cOnvertsverTs a by a Byte stte String rinG to anto An inte inTeger vger value
aluE

BUFFER_SIZE = 500: The same as the input file.

So, I'm accessing out of bounds memory, right? But, where? I can't find the source of this problem (and most likely I have a misunderstanding in how to work with C strings).

PS:

I read here that maybe my problem is that I forgot to add the NULL mark at the end of the string. Doing:

 test2[0] = aux_test;
 test2[0][ strlen(aux_test)+1 ] = '\0';

 /* OR THIS */
 test2[0][read+1] = '\0';

produces the same result.

War es hilfreich?

Lösung

Part of your problem is that you are reading outside the bounds of your arrays, and fread() certainly doesn't null terminate anything.

For example:

printf("\tMY_BUFFER, SIZEOF: %lu, STRLEN: %lu\n",sizeof(my_buffer),strlen(my_buffer));

You read 5 bytes of data into an array of size 5 bytes. The strlen() reports 5; you were lucky that the first byte beyond the end of the array happened to be a zero byte, but since it was outside the array, you invoked undefined behaviour at that point (even though you got the answer you were expecting).

In the loop, in the first iteration, the toupper() case-converts a blank, which doesn't change it. test2[0] and test2[1] both point to the same string, so if the toupper() did anything, it would affect the value pointed at by both those pointers.

When the junk values 'appear', you've put non-zero bytes into the data after the end of my_buffer, and the strlen() reads through those non-zero bytes until it reaches a zero byte. So, the problem is all due to not ensure that your character buffers are null terminated within the allocated length. When you invoked undefined behaviour, weird stuff can happen.

Note that if you use printf("<<%.*s>>\n", read, my_buffer); you will only print the bytes of data that were read.


You ask about:

test2[0] = aux_test;
test2[0][ strlen(aux_test)+1 ] = '\0';
/* OR THIS */
test2[0][read+1] = '\0';

You are accessing one byte beyond the end of what was provided. By definition, strlen(str) returns the first number len such that str[len] == '\0'. When you write test2[0][[strlen(aux_test)+1] = '\0'; therefore, you are writing one byte beyond the end of the first null in the string. The test2[0][read+1] = '\0'; assignment, assuming you've just read 5 bytes, overwrites test2[0][6], but the last byte of data that was read is in test2[0][4], so you've not changed test2[0][5] (and it isn't clear whether you're allowed to).

test2[0][strlen(aux_test)] = '\0';  // No-op, but safe
test2[0][read] = '\0';              // If you left enough space, null terminates the input

Andere Tipps

In every case, the garbage starts after the 5th bit, as should be expected since #define BUFFER_SIZE 5. If after you read in the value, a '\0' was used to null terminate the legal length of the string (5), like this:

my_buffer[strlen(my_buffer)-1]=0; //or since you know its length, my_buffer[4]=0;

That would make the contents of my buffer a legal string. To actually fix the problem, create my_buffer with more space in the first place, then always terminate with '\0'.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top