Question

Basically I'm trying to write to a binary file so that when it is opened with a text editor it will display all ASCII characters. I noticed this works with notepad but doesn't work with notepad++ or open office and gives strange results. Why?

#include <fstream>
using namespace std;
int main () {
    ofstream file ("file.bin", ios::binary);
    for(int num = 0; num < 128; num++)
        file.write (reinterpret_cast<const char *>(&num), sizeof(num));
    file.close ();
    return 0;
}

So I expect the file when opened with a text editor to roughly reproduce this ASCII chart. When I open it with notepad I get this

   !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /   0 

1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~

When I open it with notepad++ I get this screen shot of notepad++

When I open it with OpenOffice.org Writer I select the default option to open with "Western Europe (Windows 1252/WinLatin 1)" and get a bunch of ###. Does this have to do with the byte-order-marker?

I tried modifying the program to use file.write (reinterpret_cast<const char *>(&num), sizeof(char)); since the int is being type casted to a char but then the program crashes.

Out of curiosity anyone got an explanation as to why OpenOffice writer comes up with # and spaces?

Was it helpful?

Solution

for(int num = 0; num < 128; num++)
    file.write (reinterpret_cast<const char *>(&num), sizeof(num));

In these lines, you are writing the number num in binary, as a 4 byte integer. (sizeof(num), where num is an int). Since all of the values you are writing are less than 128, the first three bytes of num are always 0x000000. So for each value you write, you're getting three nulls and then the ASCII character you intended.

Microsoft Notepad is absolutely stupid, and when it reaches an ASCII character that isn't printable, such as NULL, it simply displays a space. Notice how your ASCII values have "spaces" between them. Also, I bet it looks more like this, where I've replaced some spaces with underscores. Note that the 10th value (new line) causes a new line.

_________
__            !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /   0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?   @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _   `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~   

Notepad++ is much more intelligent, and instead of converting all the strange things to spaces, shows you their name or number. Notepad++ also seems to have gotten confused by the character 13 (carriage return) by itself, since 13 is also usually part of a new line. It (reasonably) decided to make that a new line as well. Notepad++'s handling of this file is arguably more correct. This can be verified by looking at the first eight characters: "NUL NUL NUL NUL - SOH NUL NUL NUL" First is the zero, then the "SOH" character. If we look at your ASCII chart, it says the first ASCII character is "SOH - Start of Heading".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top