Question

I tried to use tess-two, a fork of Tesseract Tools for Android. I want to turn on hocr output in tesseract, from this link, I tried to set variable tessedit_create_hocr as true, but I can't see hocr in output. Here is my try:

  baseApi.init(FileUtil.getAppFolder(), "eng", TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);
  baseApi.setVariable("tessedit_create_hocr", "1")
  baseApi.setImage(bitmap);
  String recognizedText = baseApi.getUTF8Text();

Somebody told the hocr output should be in config folder or in folder contain image, but I don't see anything. Any I don't know how to config the file name and location of hocr output.

Another thing: is there any way to apply config file into Tesseract Tools for Android? I put the config files into tessdata/config folder, but there is nothing happen. How to tell tesseract should read these config files? Seem they don't have enough documents for android.

Update: Thanks to @nguyenq, now I can get HOCR data. Here is my try:

  jstring Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetHOCRText(JNIEnv *env,
                                                                        jobject thiz,    jint page) {

 native_data_t *nat = get_native_data(env, thiz);

 char *text = nat->api.GetHOCRText(page);

 jstring result = env->NewStringUTF(text);

 free(text);

 return result;
 }
Was it helpful?

Solution

Apparently, tess-two does not implement all the TessBaseAPI as it does not include support for the native GetHOCRText method. You may have to extend the wrapper yourself to access the functions you need.

The config files are meant for command-line execution. Alternatively, you can set the necessary variables through the exposed API method setVariable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top