Domanda

What open source C or C++ libraries can convert arbitrary UTF-32 to NFC?

Libraries that I think can do this so far: ICU, Qt, GLib (not sure?).

I don't need any other complex Unicode support; just conversion from arbitrary but known-correct UTF-32 to UTF-32 that is in NFC form.

I'm most interested in a library that can do this directly. For example, Qt and ICU (as far as I can tell) both do everything via an intermediate conversion stage to and from UTF-16.

È stato utile?

Soluzione

ICU or Boost.Locale (wrapping ICU) will be your best by a very, very long way. The normalisation mappings will be equivalent with those from more software, which I assume is the point of this conversion.

Altri suggerimenti

Here is the main part of the code I ended up using after deciding on ICU. I figured I should put it here in case it helps someone who tries this same thing.

std::string normalize(const std::string &unnormalized_utf8) {
    // FIXME: until ICU supports doing normalization over a UText
    // interface directly on our UTF-8, we'll use the insanely less
    // efficient approach of converting to UTF-16, normalizing, and
    // converting back to UTF-8.

    // Convert to UTF-16 string
    auto unnormalized_utf16 = icu::UnicodeString::fromUTF8(unnormalized_utf8);

    // Get a pointer to the global NFC normalizer
    UErrorCode icu_error = U_ZERO_ERROR;
    const auto *normalizer = icu::Normalizer2::getInstance(nullptr, "nfc", UNORM2_COMPOSE, icu_error);
    assert(U_SUCCESS(icu_error));

    // Normalize our string
    icu::UnicodeString normalized_utf16;
    normalizer->normalize(unnormalized_utf16, normalized_utf16, icu_error);
    assert(U_SUCCESS(icu_error));

    // Convert back to UTF-8
    std::string normalized_utf8;
    normalized_utf16.toUTF8String(normalized_utf8);

    return normalized_utf8;
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top