Keep in mind that a regex engine will try all that is possible to make the pattern succeed. Since you use several .*?
in your pattern, you let a lot of flexibility to the regex engine to pursue this purpose. The pattern must be more binding.
To do that, you can replace all the .*?
with [^>]*
Don't forget to add optional white-spaces between each tag \s*
in the pattern.
Example:
(<ac:image[^>]*> \s* <ri:attachment[^>]*> ) # group 1
\s* <ri:page[^>]*/> \s* # what you need to remove
(</ri:attachment> \s* </ac:image>) # group 2
replacement: $1$2