문제

I'm using a .NET port of Mecab (called NMecab) to try to parse Japanese Hiragana, Katakana, and Kanji to romaji.

Here's my code:

using NMeCab;    
MeCabTagger _tagger;

public string Parse(string input)
{
    _tagger = MeCabTagger.Create();
    _tagger.OutPutFormatType = "lattice";
    _tagger.LatticeLevel = MeCabLatticeLevel.Two;


    var output = _tagger.Parse(input);

    return output;
}

When I call Parse(input) using the following Japanese text: "ども"

I get the output: "ども助詞,接続助詞,,,,,ども,ドモ,ドモ EOS"

I'm looking for the romaji of "ども", which would be "domo."

I've tried to use Mecab directly as discussed in this SO answer, but get the same output.

도움이 되었습니까?

해결책

To my knowledge none of the dictionaries used by MeCab (IPA, Jumandic, or Unidic) includes romaji transcription of words. And actually there is no need for that:

  1. There exist different transcription schemes (e.g. Hepburn, kunrei, 99 siki);

  2. Information on the pronunciation of lexical units is already available (e.g. ドモ).

You have to write your own transcription routine... or look for an existing katakana-romaji transcription module (compatible with your transcription scheme)...

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top