Domanda

how can I customize Plone search engine in order to actvate full text indexing of excel files? I have already installed pdftotext and wv for pdf, word files full text indexing.

È stato utile?

Soluzione

If you add Products.OpenXml to your instance eggs and install it in Plone you can index modern Office formats, at least .docx and .xlsx. For plain old Excel (.xls) files this does not work.

I tried it in a Plone 4.3.2 buildout config a few weeks ago:

[instance]
eggs =
    ...
    Products.OpenXml

[versions]
# You need a more recent lxml than default Plone, some 3.x version
lxml = 3.3.3
Products.OpenXml = 1.1.1

Alternatively or additionally, use Products.AROfficeTransforms. I have only tried it in combination with Products.OpenXml, but Products.AROfficeTransforms on its own is sufficient if you are only interested in old-style excel sheets, .xls. In a buildout config:

[instance]
eggs =
    ...
    Products.AROfficeTransforms

[versions]
Products.AROfficeTransforms = 0.11.0

It requires the xlhtml binary to be installed on your system. This is an ancient binary, last changed in 2002. I did not try to install it myself.

Altri suggerimenti

Try ftw.tika

Supported formats:

  • Microsoft Office formats (Office Open XML)
  • *.docx Word Documents
  • *.dotx Word Templates
  • *.xlsx Excel Sheets
  • *.xltx Excel Templates
  • *.pptx Powerpoint Presentations
  • *.potx Powerpoint Templates
  • *.ppsx Powerpoint Slideshows
  • Legacy Microsoft Office (97) formats
  • Rich Text Format
  • OpenOffice ODF formats
  • OpenOffice 1.x formats
  • Common Adobe formats (InDesign, Illustrator, Photoshop)
  • PDF documents
  • WordPerfect documents E-Mail messages

It's based on apache tika and runs as a service managed by supervisor (You have to extend your buildout).

It's integrated with portal_transforms, is well tested and documented.

More infos:

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top