سؤال

this follows on from here:

Java Apache POI read Word (.doc) file and get named styles used

at the time (10/2012) there was a solution to finding paragraph styles but not character styles.

And yet... if you use LibreOffice Writer to open a Word doc, for example, it does translate styles and highlighting from .doc to .odt ... so someone somewhere appears to have cracked this...

I don't know whether the Apache POI team and the LibreOffice/OpenOffice teams are in any way related, but I'd have thought the Apache POI team would've been able to get this functionality from the LO source code. Am I being naive?

هل كانت مفيدة؟

المحلول

Promoting some comments to an answer:

If you look at the answer given in Java Apache POI read Word (.doc) file and get named styles used, you'll see about how Apache Tika extracts paragraph style names. Taken from the Paragraph javadoc:

public short getStyleIndex()

Returns the index of the style which applies to this Paragraph. Details of the style can be looked up from the StyleSheet, via StyleSheet.getStyleDescription(int)

In your case, what you're after is the equivalent but for a Character Run. That is also (now) possible, as given in the CharacterRun.getStyleIndex() javadocs

public short getStyleIndex()

Returns the index of the base style which applies to this Run. Details of the style can be looked up from the StyleSheet, via StyleSheet.getStyleDescription(int).

Note that runs typically override some of the style properties from the base, so normally style information should be fetched directly from the CharacterRun itself.

To see this in action, a good example is given in the TestRangeProperties unit test. From there, we see code like this:

Range r = u.getRange();
StyleSheet ss = r._doc.getStyleSheet();

Paragraph p1 = r.getParagraph(0);
CharacterRun c1a = p1.getCharacterRun(0);

assertEquals("Normal", ss.getStyleDescription(c1a.getStyleIndex()).getName());

That shows you how to get the name of the base style applied to a Character Run

One final thing - for you'll need to use either a nightly build, or wait a bit for 3.11 beta 1, as some of the code mentioned isn't in 3.10 final.

نصائح أخرى

use

paragraph.getCTP().getPPr().getRPr().isSetB()
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top