I need to use a Java library - or code - to extract field tags from the content of an ODT document. I know odt is some sort of zipped file and it has its contents ina a content.xml file. Of course I could just extract the files, open content.xml and parse it, but I believe some higher level code exists. Just as an example, the content looks like this:

<text:p text:style-name="Standard">Hi ${name}!</text:p>    
<text:p text:style-name="Standard">
<text:text-input text:description="JOOScript">$nome</text:text-input></text:p>

I would like to extract the fields as ${name} and $nome.

I know Apache Tika could be used for that, but I haven't spotted an example that actually shows field extraction. I believe this is because the fields I am using are unstructured text instead of input field tags.

Thanks in advance, Daniel

有帮助吗?

解决方案

Well, just in case anyone is interested, we ended up using Apache Tika for obtaining the content from the odt and we have parsed it using the following regular expression:

\$\{[\w\-\.]*\}
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top