ASCII to NSData

https://stackoverflow.com/questions/4269094

28-09-2019
|

Question

This is another crack at my MD5 problem. I know the issue is with the ASCII character © (0xa9, 169). Either it is the way I am inserting the character into the string or its a higher vs lower byte problem.

If I

 NSString *source = [NSString stringWithFormat:@"%c", 0xa9];

    NSData *data = [source dataUsingEncoding:NSASCIIStringEncoding];

    NSLog(@"\n\n ############### source %@ \ndata desc %@", source, [data description]);

CC_MD5([data bytes], [data length], result);

     return [NSString stringWithFormat:
   @"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
   result[0], result[1], result[2], result[3], 
   result[4], result[5], result[6], result[7],
   result[8], result[9], result[10], result[11],
   result[12], result[13], result[14], result[15]
   ];

Result:

######### source ©

[data description] = (null)
md5: d41d8cd98f00b204e9800998ecf8427e

values: int 169 char ©

When I change the encoding to

NSData *data = [NSData dataWithBytes:[source UTF8String] length:[source length]];

The result is

######### source ©

[data description] = "<"c2>
md5: 6465dad1d31752be3f3283e8f70feef7

When I change the encoding to

NSData *data = [NSData dataWithBytes:[source UTF8String] length:[source lengthOfBytesUsingEncoding:NSUTF8StringEncoding]];

The result is ############### source © len 2 [data description] = "<"c2a9>
md5: a541ecda3d4c67f1151cad5075633423

When I run the same function in Java I get

">>>>> msg## \251 \251
md5 a252c2c85a9e7756d5ba5da9949d57ed

The question is what is the best way to get the same byte in objC as I get in Java?

Solution 2

Thanks to GBegan's explanation in another post I was able to cobble this together.

for(int c = 0; c < [s length]; c++){
    int number = [s characterAtIndex:c];
    unsigned char c[1];
    c[0] = (unsigned char)number;
    NSMutableData *oneByte = [NSMutableData dataWithBytes:&c length:1];

}

OTHER TIPS

“ASCII to NSData” makes no sense, because ASCII is an encoding; if you have encoded characters, then you have data.

An encoding is a transformation of ideal Unicode characters (code points) into one-or-more-byte units (code units), possibly in sequences such as UTF-16's surrogate pairs.

An NSString is more or less an ideal Unicode object. It contains the characters of the string, in Unicode, irrespective of any encoding*.

ASCII is an encoding. UTF-8 is also an encoding. When you ask the string for its UTF8String, you are asking it to encode its characters as UTF-8.

NSData *data = [NSData dataWithBytes:[source UTF8String] length:[source length]];

The result is

 ######### source ©
 [data description] = "<"c2>

That's because you passed the wrong length. The string's length (in characters) is not the same as the number of code units (bytes, in this case) in some encoding.

The correct length is strlen([source UTF8String]), but it's easier for you and faster at run time to use dataUsingEncoding: to ask the string to create the NSData object for you.

When I change the encoding to

NSData *data = [NSData dataWithBytes:[source UTF8String] length:[source lengthOfBytesUsingEncoding:NSUTF8StringEncoding]];

You didn't change the encoding. You're still encoding it as UTF-8.

Use dataUsingEncoding:.

The question is what is the best way to get the same byte in objC as I get in Java?

Use the same encoding.

There is no such thing as “extended ASCII”. There are several different encodings that are based on (or at least compatible with) ASCII, including ISO 8859-1, ISO 8859-9, MacRoman, Windows codepage 1252, and UTF-8. You need to decide which one you mean and tell the string to encode its characters with that.

Better yet, continue using UTF-8—it is almost always the right choice for mostly-ASCII text—and change your Java code instead.

NSData *data = [source dataUsingEncoding:NSASCIIStringEncoding];

Result:

[data description] = (null)

True ASCII can only encode 128 possible characters. Unicode includes all of ASCII unchanged, so the first 128 code points in Unicode are what ASCII can encode. Anything else, ASCII cannot encode.

I've seen NSASCIIStringEncoding behave as equivalent to NSISOLatin1StringEncoding before; it sounds like they might have changed it to be a pure ASCII encoding, and if that's the case, that's a good thing. There is no copyright symbol in ASCII. What you see here is the correct result.

*This is not quite true; the characters are exposed as UTF-16, so any characters outside the Basic Multilingual Plane are exposed as surrogate pairs, not whole characters as they would be in a truly ideal string object. This is a trade-off. In Swift, the built-in String type is a perfect ideal Unicode object; characters are characters, never divided until encoded. But when working with NSString (whether in Swift or in Objective-C), as far as you are concerned, you should treat it as an ideal string.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow