
I want to strip all tags, remove the [show][Hide] stuffs from wikipedia, or is there some website that makes pages in more readable format.

Please I am aware of the Wikipedia printable version, but I don't need any tags in that, as I have some other use. So please answer the original question only, about any website or webservice or code snippets in php/C# to remove the tags from a webpages.

Also like when I copy some list from firefox it replaces <li> with the *, is it possible to set something in firefox to return some other non readable character like some kind of

  • dot

  • Was it helpful?


    You could use an HTML parser, BeautifulSoup (Python) or Simple HTML DOM for example. Or you could try using an XML parser.


    You can start by taking a look at the strip_tags function.

    I want to strip all tags, remove the [show][Hide] stuffs from wikipedia, or is there some website that makes pages in more readable format.

    You should take a look at DBpedia, Wikipedia, but just the data.

    What about htmlagilitypack


    Similar thread available in stackoverflow

    Is there a Wikipedia API?

    Try this function.

    Dim pattern As String = "<(.|\n)*?>"
    Return System.Text.RegularExpressions.Regex.Replace(strHtmlString, pattern, String.Empty).Trim()
    Licensed under: CC-BY-SA with attribution
    Not affiliated with StackOverflow
    scroll top