C: edit file to remove null characters:
Вопрос
I thought this would be an easy task, after a couple of tries I try the tried and true write to a temp than reopen and rewrite:
#include <stdlib.h>
#include <stdio.h>
int main()
{
FILE *f = fopen("main2.c","r");
FILE *t = fopen("temp","w");
int c;
int count = 0;
while((c = fgetc(f))!=EOF)
{
if(c)
{
fputc(c,t);
}
else
{
printf("null found\n");
}
}
fclose(f);
fclose(t);
FILE *n = fopen("main2.c","w");
FILE *w = fopen("temp","r");
while((c=fgetc(w))!=EOF)
{
fputc(c,n);
}
fclose(n);
fclose(w);
return 0;
}
this just spits out a bunch of chinese characters. Could the underlying character encoding be the issue? Or am I just a total noob here?
My hex editor won't let me copy/paste. I don't know how I can get the file up here in its original condition so I have it zipped in google docs let me know immediately if you cant get it:
https://docs.google.com/open?id=0B4UPOuCR5uRGZzJQZUpVaktKYlk
EDIT: wait wait here it is via HxE Edit:
FF FE 23 00 69 00 6E 00 63 00 6C 00 75 00 64 00 65 00 20 00 3C 00 73 00 74 00 64 00
6C 00 69 00 62 00 2E 00 68 00 3E 00 0D 00 0A 00 23 00 69 00 6E 00 63 00 6C 00 75 00 64 00
65 00 20 00 3C 00 61 00 6C 00 6C 00 65 00 67 00 72 00 6F 00 2E 00 68 00 3E 00 0D 00 0A 00
23 00 69 00 6E 00 63 00 6C 00 75 00 64 00 65 00 20 00 22 00 6D 00 6F 00
Решение
Odds are that you are removing NULL bytes because the input is UTF-16 Unicode. If so, you also must remove the byte-order mark (BOM) at the start of the file. If the first two bytes are 0xFF, 0xFE then you have a little-endian UTF-16 file. Discard them! If you leave them in, every pair of ASCII characters in your source will be treated as a combined 16-bit character code. Strangeness will ensue.
Likewise if the first two bytes are 0xFE, 0xFF, the file is big-endian UTF-16 and you must also delete those two bytes, else the file will be treated as 16-bit codes again, only with high bytes first.
Другие советы
Open the files in binary mode:
FILE *f = fopen("main2.c","rb");
FILE *t = fopen("temp","wb");