سؤال

From hiragana and katakana charts, it looks like it should be possible to "normalize" japanese text into hiragana or katakana. It's pretty straight-forward to build a table and implement a dictionary/regex table for search/replace. Does anyone know where the work's already been done?

هل كانت مفيدة؟

المحلول

Why would you want to do this though? Katakana is traditionally used for words borrowed from other languages, while hiragana is used for the Japanese native language. By normalizing the japanese text to one form or another you could actually be hindering the reading of it (at least to me it would be harder since I am loosing context by having it normalized).

But in answer to your question, this seems to do what your asking: JCONV

نصائح أخرى

You could do what you want to do very quickly using str.translate.

However it is not readily apparent why you would want to do that.

What I would call normalising in a language written in a Latin-based alphabet would include lowercasing, normalising whitespace, and stripping accents etc so that the result was ASCII. The purpose of doing that would be not for display but for comparing user-entered text in some kind of fuzzy search/match/lookup scenario. The point being that mistakes of accent etc are quite common even with native writers of the languages in question.

Given the role that Hiragana plays in the Japanese writing system (words often have a Kanji stem and Hiragana suffixes) I can't imagine any use for changing Hiragana characters to Katakana ... please enlighten me.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top