Is there a good way to do this text extraction, in OpenOffice or with any other tool?
Since you're parsing HTML, it would be easier to use an HTML parsing engine. For example in PHP you could pull all the links or all the images from a page with a few simple lines.
// Create DOM from URL or file
$html = file_get_html('path and file name');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
This could be further refined if you had some additional information about the values being pulled and how they are stored in the file.