I assume this is from the Twitter API, and you are trying to use the entities dictionary they return. I have just been writing code to support handling those ranges along with NSString
's version of the range of a string.
My approach was to "fix" the entities dictionary that Twitter return to cope with the extra characters. I can't share code, for various reasons, but this is what I did:
- Make a deep mutable copy of the entities dictionary.
- Loop through the entire range of the string,
unichar
byunichar
, doing this:- Check if the
unichar
is in the surrogate pair range (0xd800
->0xdfff
). - If it is a surrogate pair codepoint, then go through all the entries in the entities dictionary and shift the indices by 1 if they are greater than the current location in the string (in terms of
unichar
s). Then increment the loop counter by 1 to skip the partner of this surrogate pair as it's been handled now. - If it's not a surrogate pair, do nothing.
- Check if the
- Loop through all entities and check that none of them overrun the end of the string. They shouldn't, but just incase. I found some cases where Twitter returned duff data.
I hope that helps! I also hope that one day I can open source this code as I think it would be incredibly useful!