Question

I am struggling with php COM object, to read (.doc) word data. Problem is while i am retriving content from doc file, data is showing but it's not showing actual data -

example.doc content

Čo je to zubný povlak? Zubný povlak je lepkavá a bezfarebná vrstva baktérií a cukrov, ktorá sa neprestajne tvorí na povrchu zubov. Býva hlavnou príčinou zubných kazov a parodontitídy a ak sa denne neodstraňuje, môže stvrdnúť a zmeniť sa na zubný kameň.

php

$filename = 'example.doc';
$word = new COM("word.application") or die ("Could not initialise MS Word object.");
$word->Documents->Open(realpath($filename));

// Extract content.
$content = (string) $word->ActiveDocument->Content;

echo nl2br($content);

$word->ActiveDocument->Close(false);

$word->Quit();
$word = null;
unset($word);

result is showing

Co je to zubný povlak? Zubný povlak je lepkavá a bezfarebná vrstva baktérií a cukrov, ktorá sa neprestajne tvorí na povrchu zubov. Býva hlavnou prícinou zubných kazov a parodontitídy a ak sa denne neodstranuje, môže stvrdnút a zmenit sa na zubný kamen. Ako zistím, že mám zubný povlak?

Like Čo is showing Co instead of Čo

Any help is highly appreciated or is there any other way to read doc file (not docx) which will show character perfectly?

Was it helpful?

Solution

It is maybe an error with the encoding (is your text in utf8 ?).

As per the doc (look for the codepage param), you can set a code page as the third argument. For exemple with utf8 :

$word = new COM("word.application", NULL, CP_UTF8) ;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top