The .*?
quantifier means that it will find as few characters as possible to satisfy the match, it doesn't mean that it will stop searching at the first >
it finds. So in your example, the <x.*?>
will match all of:
<x>ipsum <x>dolor sit amet</x>
With all the characters between the first x
and the the final >
satisfying the .*?
. To fix this, you can simply change your pattern to:
<x[^>]*> +</x>
On a side note, it's been stated many times before, but you should not use regular expressions to parse xml/html/xhtml.