Domanda

I'm using a .NET port of Mecab (called NMecab) to try to parse Japanese Hiragana, Katakana, and Kanji to romaji.

Here's my code:

using NMeCab;    
MeCabTagger _tagger;

public string Parse(string input)
{
    _tagger = MeCabTagger.Create();
    _tagger.OutPutFormatType = "lattice";
    _tagger.LatticeLevel = MeCabLatticeLevel.Two;


    var output = _tagger.Parse(input);

    return output;
}

When I call Parse(input) using the following Japanese text: "ども"

I get the output: "ども助詞,接続助詞,,,,,ども,ドモ,ドモ EOS"

I'm looking for the romaji of "ども", which would be "domo."

I've tried to use Mecab directly as discussed in this SO answer, but get the same output.

È stato utile?

Soluzione

To my knowledge none of the dictionaries used by MeCab (IPA, Jumandic, or Unidic) includes romaji transcription of words. And actually there is no need for that:

  1. There exist different transcription schemes (e.g. Hepburn, kunrei, 99 siki);

  2. Information on the pronunciation of lexical units is already available (e.g. ドモ).

You have to write your own transcription routine... or look for an existing katakana-romaji transcription module (compatible with your transcription scheme)...

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top