RegEx needed for Wikipedia infobox

https://stackoverflow.com/questions/21228760

PHP
regex
wikipedia
wikipedia-api

30-09-2022
|

Pergunta

OK, so here's what I need :

We have the full XML of a Wikipedia article
We need just the Infobox section

I have tried various things, but my main issue seems to be not being able to matching "internal" curly brackets. Any ideas (or any regex you have managed to get this done?)

For those of you who do not know what I'm talking about, here's a (somewhat abridged) example of what I'm trying to parse : http://regexr.com?38299

(What is needed is the part between {{Infobox ******* up to its corresponding closing brackets (}}).

Solução

Ok, I got it!

Try this..:

(?=\{Infobox)(\{([^{}]|(?1))*\})

Here's the working example:

http://regex101.com/r/kT1jF4

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow