Question

My aim is to generate the phonetic transcription for any word according to a set of rules.

First, I want to split words into their syllables. For example, I want an algorithm to find 'ch' in a word and then separate it like shown below:

Input: 'aachbutcher'
Output: 'a' 'a' 'ch' 'b' 'u' 't' 'ch' 'e' 'r'

I have come so far:

check=regexp('aachbutcher','ch');

if (isempty(check{1,1})==0)          % Returns 0, when 'ch' was found.

   [match split startIndex endIndex] = regexp('aachbutcher','ch','match','split')

   %Now I split the 'aa', 'but' and 'er' into single characters:
   for i = 1:length(split)
       SingleLetters{i} = regexp(split{1,i},'.','match');
   end

end

My problem is: How do I put the cells together, such that they are formatted like the desired output? I only have the starting indexes for the match parts ('ch') but not for the split parts ('aa', 'but','er').

Any ideas?

Was it helpful?

Solution

You don't need to work with the indices or length. Simple logic: Process first element from match, then first from split, then second from match etc....

[match,split,startIndex,endIndex] = regexp('aachbutcher','ch','match','split');

%Now I split the 'aa', 'but' and 'er' into single characters:
SingleLetters=regexp(split{1,1},'.','match');

for i = 2:length(split)
   SingleLetters=[SingleLetters,match{i-1},regexp(split{1,i},'.','match')];
end

OTHER TIPS

So, you know the length of 'ch', it's 2. You know where you found it from regex, as those indices are stored in startIndex. I'm assuming (Please, correct me if I'm wrong) that you want to split all other letters of the word into single-letter cells, like in your output above. So, you can just use the startIndex data to construct your output, using conditionals, like this:

check=regexp('aachbutcher','ch');

if (isempty(check{1,1})==0)          % Returns 0, when 'ch' was found.

    [match split startIndex endIndex] = regexp('aachbutcher','ch','match','split')

    %Now I split the 'aa', 'but' and 'er' into single characters:
    for i = 1:length(split)
       SingleLetters{i} = regexp(split{1,i},'.','match');
    end

end

j = 0;
for i = 1 : length('aachbutcher')
    if (i ~= startIndex(1)) && (i ~= startIndex(2)) 
        j = j +1;
        output{end+1} = SingleLetters{j};
    else
        i = i + 1;    
        output{end+1} = 'ch';
    end
end

I don't have MATLAB right now, so I can't test it. I hope it works for you! If not, let me know and I'll take anther shot at it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top