Question

I need to use a Java library - or code - to extract field tags from the content of an ODT document. I know odt is some sort of zipped file and it has its contents ina a content.xml file. Of course I could just extract the files, open content.xml and parse it, but I believe some higher level code exists. Just as an example, the content looks like this:

<text:p text:style-name="Standard">Hi ${name}!</text:p>    
<text:p text:style-name="Standard">
<text:text-input text:description="JOOScript">$nome</text:text-input></text:p>

I would like to extract the fields as ${name} and $nome.

I know Apache Tika could be used for that, but I haven't spotted an example that actually shows field extraction. I believe this is because the fields I am using are unstructured text instead of input field tags.

Thanks in advance, Daniel

Was it helpful?

Solution

Well, just in case anyone is interested, we ended up using Apache Tika for obtaining the content from the odt and we have parsed it using the following regular expression:

\$\{[\w\-\.]*\}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top