I tried to use tess-two, a fork of Tesseract Tools for Android. I want to turn on hocr
output in tesseract, from this link, I tried to set variable tessedit_create_hocr
as true, but I can't see hocr in output. Here is my try:
baseApi.init(FileUtil.getAppFolder(), "eng", TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);
baseApi.setVariable("tessedit_create_hocr", "1")
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
Somebody told the hocr
output should be in config folder or in folder contain image, but I don't see anything. Any I don't know how to config the file name and location of hocr output.
Another thing: is there any way to apply config file into Tesseract Tools for Android? I put the config files into tessdata/config folder, but there is nothing happen. How to tell tesseract
should read these config files? Seem they don't have enough documents for android.
Update: Thanks to @nguyenq
, now I can get HOCR
data. Here is my try:
jstring Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetHOCRText(JNIEnv *env,
jobject thiz, jint page) {
native_data_t *nat = get_native_data(env, thiz);
char *text = nat->api.GetHOCRText(page);
jstring result = env->NewStringUTF(text);
free(text);
return result;
}