Matching a word and all possible abbreviations from the nth letter isn't quite an easy task in regex.
Here is how I would do it for the word insurance from the 4th letter:
insu(?>r(?>a(?>n(?>c(?>(?<last>e))?)?)?)?)?(?(last)|\.)
It works by using atomic groups to force the regex engine as far as possible forward past the last 'rance' letters using the nested pattern (?>a(?>b)?)?
. If the last letter letter is matched we're not dealing with an abbreviation thus no dot is required, otherwise the dot is required. This is coded by (?(last)|\.)
.
To trim, I would create a function to build the above regex for an abbreviation. Then you can write a while loop that replaces each of the abbreviation regexes with empty space until there are no more matches.
Non regex version
Here is my non regex version that removes multiple words and abbreviated words from a string:
function trimWords($str, $word, $min_abbrv, $case_insensitive = false) {
$len = 0;
$word_len = strlen($word);
$strlen = strlen($str);
$cmp = $case_insensitive ? strncasecmp : strncmp;
for ($i = 0; $i < $strlen; $i++) {
if ($cmp($str[$i], $word[$len], $i) == 0) {
$len++;
} else if ($len > 0) {
if ($len == $word_len || ($len >= $min_abbrv && ($dot = $str[$i] == '.'))) {
$i -= $len;
$len += $dot;
$str = substr($str, 0, $i) . substr($str, $i+$len);
$strlen = strlen($str);
$dot = 0;
}
$len = 0;
}
}
return $str;
}
Example:
$string = 'ins. <- "ins." / insu. insuranc. insurance / insurance. <- "."';
echo trimWords($string, 'insurance', 4);
Output is:
ins. <- "ins." / / . <- "."