Question

I want to strip all tags, remove the [show][Hide] stuffs from wikipedia, or is there some website that makes pages in more readable format.

Please I am aware of the Wikipedia printable version, but I don't need any tags in that, as I have some other use. So please answer the original question only, about any website or webservice or code snippets in php/C# to remove the tags from a webpages.

Also like when I copy some list from firefox it replaces <li> with the *, is it possible to set something in firefox to return some other non readable character like some kind of

  • dot

  • Was it helpful?

    Solution

    You could use an HTML parser, BeautifulSoup (Python) or Simple HTML DOM for example. Or you could try using an XML parser.

    OTHER TIPS

    You can start by taking a look at the strip_tags function.

    I want to strip all tags, remove the [show][Hide] stuffs from wikipedia, or is there some website that makes pages in more readable format.

    You should take a look at DBpedia, Wikipedia, but just the data.

    http://dbpedia.org/About

    What about htmlagilitypack

    htmlagilitypackt

    Similar thread available in stackoverflow

    Is there a Wikipedia API?

    Try this function.

    Dim pattern As String = "<(.|\n)*?>"
    Return System.Text.RegularExpressions.Regex.Replace(strHtmlString, pattern, String.Empty).Trim()
    
    Licensed under: CC-BY-SA with attribution
    Not affiliated with StackOverflow
    scroll top