Pergunta

OK, so here's what I need :

  • We have the full XML of a Wikipedia article
  • We need just the Infobox section

I have tried various things, but my main issue seems to be not being able to matching "internal" curly brackets. Any ideas (or any regex you have managed to get this done?)

For those of you who do not know what I'm talking about, here's a (somewhat abridged) example of what I'm trying to parse : http://regexr.com?38299

(What is needed is the part between {{Infobox ******* up to its corresponding closing brackets (}}).

Foi útil?

Solução

Ok, I got it!

Try this..:

(?=\{Infobox)(\{([^{}]|(?1))*\})

Here's the working example:

http://regex101.com/r/kT1jF4

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top