Question

Let me start off by saying that I am not particularly trying to find a solution, just the root cause of the problem. I am trying to retrieve a JSON from a url. In browser, the url call works just fine and I am able to see the entire JSON without issue. However, in x-code when simply using NSURLConnection, I am getting data bytes, but my NSString is null.

    theString = [[NSString alloc] initWithData:urlData encoding:NSUTF8StringEncoding];

After doing some research I have found that I am probably trying to use the wrong encoding. I am not sure what type of encoding is being used by the url, so on first instinct I just tried some random encoding types.

    NSString* myString = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
    NSString* myString2 = [[NSString alloc] initWithData:data encoding:NSUTF16StringEncoding];
    NSString* myString3 = [[NSString alloc] initWithData:data encoding:NSWindowsCP1252StringEncoding];

NSASCIIStringEncoding and NSWindowsCP1252StringEncoding is able to bring back a partially correct JSON. It is not the entire JSON thatI am able to view in the browser, and some characters are a little messed up, but it is something. To try and better determine what encoding was used, I decided to use the following method to try and determine it by looking at what encoding returned.

NSError *error = nil;
NSStringEncoding encoding;
NSString *my_string = [[NSString alloc] initWithContentsOfURL:url
                                                 usedEncoding:&encoding
                                                        error:&error];

My NSStringEncoding value is 3221214344. And this number is consistent everytime I run the app. I can not find any NSStringEncoding values that even come close to matching this.

My final question is: Is the encoding used for this url not consumable by iOS, is it possible that multiple types of encoding was used for this url, or is there something else that I could be doing wrong on my end?

Was it helpful?

Solution

It's best not to rely on Cocoa to figure out the string encoding if possible, especially if the data might be corrupted. A better approach would be to check if the value indicated by the HTTP Content-Type header specifies a character set like in this example:

Content-Type: text/html; charset=ISO-8859-4

Once you're able to parse and retrieve a character set name from the Content-Type header, you need to convert it to an NSStringEncoding, first by passing it to CFStringConvertIANACharSetNameToEncoding, and then passing the returned CF string encoding to CFStringConvertEncodingToNSStringEncoding. After that, you can initialize your string using -[NSString initWithData:encoding:].

NSData *HTTPResponseBody = …; // Get the HTTP response body
NSString *charSetName = …;  // Get a charset name from the Content-Type HTTP header

// Get the Core Foundation string encoding
CFStringEncoding cfencoding = CFStringConvertIANACharSetNameToEncoding((CFStringRef)charSetName);

// Confirm this is a known encoding
if (cfencoding != kCFStringEncodingInvalidId) {
    // Initialize the string
    NSStringEncoding nsencoding = CFStringConvertEncodingToNSStringEncoding(cfencoding);
    NSString *JSON = [[NSString alloc] initWithData: HTTPResponseBody 
                                           encoding: nsencoding];
}

You still may run into problems if the string data you're working with is corrupted. For example, in the above code snippet, perhaps charSetName is UTF-8, but HTTPResponseBody can't be parsed as UTF-8 because there's an invalid byte sequence. In this situation, Cocoa will return nil when you try to instantiate your string, and short of sanitizing the data so that it conforms to the reported string encoding (perhaps by stripping out invalid byte sequences), you may want to report an error back to the end user.

As a last-ditch effort — rather than reporting an error — you could initialize a string using an encoding that can handle anything you throw at it, such as NSMacOSRomanStringEncoding. The one caveat here is that unicode / corrupted data may show up intermittently as symbols or unexpected alphanumerics.

OTHER TIPS

Even though it seems that the answer has been provided in the comments (using iso-8859-1 as the correct encoding) I thought it worthwhile to discuss how I would go about debugging this problem.

You said that the Desktop Browser (Chrome) can digest the data correctly, so let's use that:

  1. Enable Developer Tools https://developers.google.com/chrome-developer-tools/
  2. When the Dev Tools window is open, switch to "network" and execute your call in that browser tab
  3. check the output by clicking on the request url - it should give you some clue.

If that doesn't work, tools like Postman can help you to recreate the call before you implement it on the device

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top