Try this:
string[] ugaramStrings = { "கு", "சு", "டு", "து", "பு", "று" };
string[] tamilvowels =
{
"அ",// "\u0b85"
"ஆ",//"\u0b86"
"இ",//"\u0b87"
"ஈ",//"\u0b88"
"உ",//"\u0b89"
"ஊ",//"\u0b8A"
"எ",// "\u0b8E"
"ஏ",//"\u0b8F"
"ஐ",//"\u0b90"
"ஒ",//"\u0b92"
"ஓ",//"\u0b93"
"ஔ"//"\u0b94"
};
var rxTemp = "(" +
string.Join("|", ugaramStrings) + ")(" +
string.Join("|", tamilvowels) + ")";
var rx = new Regex(rxTemp);
string str = "அமர்ந்*துஇ*னிது";
// This will contain all the matches
var matches = new List<Match>();
string str2 = rx.Replace(str, match => {
matches.Add(match);
// Group[1] will contain the ugaram letter,
// Group[2] will contain the tamil vowel
return match.Groups[2].Value;
});
it seems to work correctly. The str2
will contain the replaced string while matches
will contain all the matches
Note that ugaram characters are composed characters, for example, so each ugaram "character" uses two C# char
s.
For example கு is 'க' + 'ு'.
This is illegal:
char ch = 'இ';
This is legal:
string str = "இ"; // str.Length == 2
For this reason you can't simply [குசுடுதுபுறு]
but you have to (கு|சு|டு|து|பு|று)
.