In this case you need to parse HTML and extract text yourself. As you, probably, already know, parsing HTML with regular expressions is A Bad Idea (tm). SO, the only right solution is to parse DOM of the document. On this step you are free to use any tools including standard DOMDocument class.
If you are looking for some libraries or scripts to help, i would suggest to look on html2text which could be used commercially. As i see, it doesn't support attributes for <img>
tags, but it's very easy to fix (use <a>
tag as example).
If you are looking for some automated text extraction, then you should definitely look on something like Bolierpipe.