Question

I have literally looked around everywhere and and as much as people can teach me how to write single digits, letters, symbols, etcetera in binary code, but I haven't actually seen syntactically what it would appear as. Things like addition, subtraction, spelling out a word or other things like that. Is there a space between each combination of 1s and 0s, with combinations for plus and minus, or is line by line? It just seems like something that shouldn't be this difficult to find an answer for.

Was it helpful?

Solution

Character data is, in most modern machines, managed as 8-bit bytes. (In some cases the characters are 16 or 32 bits, but that's just confusion at this juncture.)

If you look at an ASCII table you will see the basic "Latin" character set:

enter image description here

The individual characters are identified by an 8-bit byte where (for the basic ASCII chars) the high-order bit is zero. So values run between 0 and 127, or between 00 and 7F hex (or between 00000000 and 01111111 binary).

I should inject here that the first 32 codes are non-printing codes for "control characters". Eg, the code at decimal 10 or hex 0A is the "line-feed" code, which is the code known in C and Java as "newline". And the 00 code is the "NUL" character as mentioned below.

The characters in a sentence are laid out in order in memory, in successive bytes. Hence, "Hello" will be 48 65 6C 6C 6F in hex. For C and C++ a simple "C string" is always ended with a byte of all zeros (the "NUL" character in the chart). For Java the length of the string is in a separate variable somewhere else. A few character coding schemes "prefix" the string with it's length as an 8-bit or 16-bit value.

As you can see above, the ASCII character set includes non-alphabetic characters such as ! and + and ?. For "non-Latin" characters (eg, the character £ or Ç) one of several techniques is used to "extend" the character set. Sometimes those 8-bit characters with values of 128 to 255 are used to represent the non-Latin characters of a given language (though one must know which language in order to know which set of characters is being represented). In other cases "Unicode" is used, with 16-bit or 32-bit characters instead of 8-bit characters, so that virtually every character in every language has its own unique code.

OTHER TIPS

Binary is just a different way of representing numbers. It's base 2, where decimal is base 10 and hex is base 16. When people refer to "binary code", they usually just mean compiled program code, aka machine-code.

Machine code is only binary in that at a low level it's stored as a series of binary digits (bits). But when anyone human looks at it, they look at it in hex using a hex editor, which is much easier than reading binary.

Even easier would be to disassemble it into assembly language, which replaces the numbers with the names of the instructions they represent.

Here's a good example from wikipedia, which explains how these binary numbers:

10110000 01100001

Which can be represented in hex as

B0 61

Can be translated to this assembly:

MOV AL, 61h       ; Load AL with 97 decimal (61 hex)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top