Question

Is there any way to extract hyperlinks from .doc. I got bunch of hyperlinks in doc that I need to import in my database.

I have tried converting doc to HTML, but hyperlinks are not transferred.

Regardz, Mladen

Was it helpful?

Solution 3

I have done the following thing. I have opened the .doc file with officeXP, then published it as a blog and after that I have saved that blog in the form of filtered web page. That gives you nice HTML which you can parse with ease.

OTHER TIPS

We had a similar issue and ended up using a third party component called Aspose.Words. You can find it here: http://www.aspose.com

It's available for .NET and Java.

You could try importing the file into OpenOffice and see whether hyperlinks are transferred. OpenDocument is just a ZIP file with XML inside, very easy to parse once you've got the hang of it.

I realise this is some months after your initial question, however, You can also extract hyperlinks in a .doc file through through Word Automation. There are hyperlink objects in the API that you can easily extract.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top