Question

I'm coding a suffix array sorting, and this algorithm appends a sentinel character to the original string. This character must not be in the original string.

Since this algorithm will process binary files bytes, is there any special byte character that I can ensure I won't find in any binary file? If it exists, how do I represent this character in C++ coding?

I'm on linux, I'm not sure if it makes a difference.

Was it helpful?

Solution

No, there is not. Binary files can contain every combination of byte values. I wouldn't call them 'characters' though, because they are binary data, not (necessarily) representing characters. But whatever the name, they can have any value.

OTHER TIPS

This is more like a question you should answer yourself. We do not know what binary data you have and what characters can be there and what cannot. If you are talking about generic binary data - there could be any combination of bits and bytes, and characters, so there is no such character.

From the other point of view, you are talking about strings. What kind of strings? ASCII strings? ASCII codes have very limited range, for example, so you can use 128, for example. Some old protocols use SOH (\1) for similar purposes. So there might be a way around if you know exactly what strings you are processing.

To the best of my knowledge, suffix array cannot be applied to arbitrary binary data (well, it can, but it won't make any sense).

A file could contains bits only. Groups of bits could be interpreted as an ASCII character, floating point number, a photo in JPEG format, anything you could imagine. The interpretation is based on a coding scheme (such as ASCII, BCD) you choose. If your coding scheme doesn't fill the entire table of possible codes, you could pick one for your special purpouses (for example digits could be encoded naively on 4 bits, 2^4=16, so you have 6 redundant codewords).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top