Question

I have been reading about automatic text line recognition in Matlab and although there are many advanced methods to do this every paper mentions that the simplest way of detecting text lines is via horizontal projections. So I decided to try this method for myself.

I am very new to this and have hit a brick wall, I have reached a level beyond which I do not know how to proceed. This is what I have achieved so far:

I'm trying for a system that is language independent and only interested in text lines, so I chose Arabic text:

enter image description here

I used the function radon to get the projections.

img = rgb2gray(imread('arabic.jpg'));
[R, xp] = radon(bw_closed, [0 90]);
figure; plot(xp,R(:,2)); title('at angle 90');

This is the plot(projection)

enter image description here

So clearly the 5 peaks represent the 5 lines detected but how do I go from here to segmenting the original document?

Can anyone help me beyond this point? All the papers I read make no mention of how to proceed from this step, they just say that from the projections we have our detected lines.

What I'm asking is how, from the plot data can I tell matlab what is the line of text and what is the gab between lines?

Was it helpful?

Solution

r = R(:,2);
r=r(92:391); % your image region
blank = r < 3; % region without text
[labeled, n] = bwlabel(blank);
C = regionprops(labeled, 'Centroid');  % find the centers of blank regions
for i=1:length(C)-1
    subplot(length(C)-1,1,i)
    imshow(img(C(i).Centroid(2):C(i+1).Centroid(2),:));        
end

Results:

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top