What open source C or C++ libraries can convert arbitrary UTF-32 to NFC? [closed]
-
07-03-2021 - |
Domanda
What open source C or C++ libraries can convert arbitrary UTF-32 to NFC?
Libraries that I think can do this so far: ICU, Qt, GLib (not sure?).
I don't need any other complex Unicode support; just conversion from arbitrary but known-correct UTF-32 to UTF-32 that is in NFC form.
I'm most interested in a library that can do this directly. For example, Qt and ICU (as far as I can tell) both do everything via an intermediate conversion stage to and from UTF-16.
Soluzione
ICU or Boost.Locale (wrapping ICU) will be your best by a very, very long way. The normalisation mappings will be equivalent with those from more software, which I assume is the point of this conversion.
Altri suggerimenti
Here is the main part of the code I ended up using after deciding on ICU. I figured I should put it here in case it helps someone who tries this same thing.
std::string normalize(const std::string &unnormalized_utf8) {
// FIXME: until ICU supports doing normalization over a UText
// interface directly on our UTF-8, we'll use the insanely less
// efficient approach of converting to UTF-16, normalizing, and
// converting back to UTF-8.
// Convert to UTF-16 string
auto unnormalized_utf16 = icu::UnicodeString::fromUTF8(unnormalized_utf8);
// Get a pointer to the global NFC normalizer
UErrorCode icu_error = U_ZERO_ERROR;
const auto *normalizer = icu::Normalizer2::getInstance(nullptr, "nfc", UNORM2_COMPOSE, icu_error);
assert(U_SUCCESS(icu_error));
// Normalize our string
icu::UnicodeString normalized_utf16;
normalizer->normalize(unnormalized_utf16, normalized_utf16, icu_error);
assert(U_SUCCESS(icu_error));
// Convert back to UTF-8
std::string normalized_utf8;
normalized_utf16.toUTF8String(normalized_utf8);
return normalized_utf8;
}