Pergunta

I have the following part of string:

{{Infobox musical artist
|honorific-prefix  = [[The Honourable]]
| name = Bob Marley
| image = Bob-Marley.jpg
| alt = Black and white image of Bob Marley on stage with a guitar
| caption = Bob Marley in concert, 1980.
| background = solo_singer
| birth_name = Robert Nesta Marley
| alias = Tuff Gong
| birth_date = {{birth date|df=yes|1945|2|6}}
| birth_place = [[Nine Mile, Jamaica|Nine Mile]], [[Jamaica]]
| death_date = {{death date and age|df=yes|1981|5|11|1945|2|6}}
| death_place = [[Miami]], [[Florida]]
| instrument = Vocals, guitar, percussion
| genre = [[Reggae]], [[ska]], [[rocksteady]]
| occupation = [[Singer-songwriter]], [[musician]], [[guitarist]] 
| years_active = 1962–1981
| label = [[Beverley's]], [[Studio One (record label)|Studio One]],
| associated_acts = [[Bob Marley and the Wailers]]
| website = {{URL|bobmarley.com}}
}}

And I'd like to remove all of it. Now if I try the regex: \{\{(.*?)\}\} it catches {{birth date|df=yes|1945|2|6}}, which makes sense so I tried : \{\{([^\}]*?)\}\} which thens grabs from the start but ends in the same line, which also makes sense as it has encoutered }}, i've also tried without the ? greedy ,still same results. my question is, how can I remove everything that's inside a {{}}, no matter how many of the same chars are inside?

Edit: If you want my entire input, it's this: https://en.wikipedia.org/w/index.php?maxlag=5&title=Bob+Marley&action=raw

Foi útil?

Solução

Here's a solution with a DOTALL Pattern and a greedy quantifier for an input that contains only one instance of the fragment you wish to remove (i.e. replace with an empty String):

String input = "Foo {{Infobox musical artist\n"
                + "|honorific-prefix  = [[The Honourable]]\n"
                + "| name = Bob Marley\n"
                + "| image = Bob-Marley.jpg\n"
                + "| alt = Black and white image of Bob Marley on stage with a guitar\n"
                + "| caption = Bob Marley in concert, 1980.\n"
                + "| background = solo_singer\n"
                + "| birth_name = Robert Nesta Marley\n"
                + "| alias = Tuff Gong\n"
                + "| birth_date = {{birth date|df=yes|1945|2|6}}\n"
                + "| birth_place = [[Nine Mile, Jamaica|Nine Mile]], [[Jamaica]]\n"
                + "| death_date = {{death date and age|df=yes|1981|5|11|1945|2|6}}\n"
                + "| death_place = [[Miami]], [[Florida]]\n"
                + "| instrument = Vocals, guitar, percussion\n"
                + "| genre = [[Reggae]], [[ska]], [[rocksteady]]\n"
                + "| occupation = [[Singer-songwriter]], [[musician]], [[guitarist]] \n"
                + "| years_active = 1962–1981\n"
                + "| label = [[Beverley's]], [[Studio One (record label)|Studio One]],\n"
                + "| associated_acts = [[Bob Marley and the Wailers]]\n"
                + "| website = {{URL|bobmarley.com}}\n" + "}} Bar";
//                                    |DOTALL flag
//                                    |  |first two curly brackets
//                                    |  |     |multi-line dot
//                                    |  |     | |last two curly brackets
//                                    |  |     | |        | replace with empty
System.out.println(input.replaceAll("(?s)\\{\\{.+\\}\\}", ""));

Output

Foo  Bar

Notes after comments

This case implies using regular expressions to manipulate markup language.

Regular expressions are not made to parse hierarchical markup entities, and would not serve in this case so this answer is only a stub for what would be an ugly workaround at best in this case.

See here for a famous SO thread on parsing markup with regex.

Outras dicas

Use a greedy quantifier instead of the reluctant one you're using.

http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Edit: spoonfeeding: "\{\{.*\}\}"

Try this pattern, it should take care of everything:

"\\D\\{\\{I.+[\\P{M}\\p{M}*+].+\\}\\}\\D"

specify: DOTALL

code:

String result = searchText.replaceAll("\\D\\{\\{I.+[\\P{M}\\p{M}*+].+\\}\\}\\D", "");

example: http://fiddle.re/5n4zg

This regex matches a single such block (only):

\{\{([^{}]*?\{\{.*?\}\})*.*?\}\}

See a live demo.

In java, to remove all such blocks:

str = str.replaceAll("(?s)\\{\\{([^{}]*?\\{\\{.*?\\}\\})*.*?\\}\\}", "");
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top