How to extract text from PDF according to its location?

Question 1

pdftotext (take one of the latest, Poppler-based versions) does let you define a page region to extract text from.

Try this:

pdftotext    \
  -f 5       \
  -l 7       \
  -x 200     \
  -y 700     \
  -W 144     \
  -H 80      \
   input.pdf \
   output.txt

It selects page range 5-7, and a rectangle of width = 144 points (72 points == 1 inch), height = 80 points where the top left corner is at x-coordinate 200, and y-coordinate 700.

Question 2

You could use PDFBox. https://pdfbox.apache.org/apidocs/org/apache/pdfbox/util/PDFTextStripperByArea.html

PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition( true );
List allPages = document.getDocumentCatalog().getAllPages();
PDPage firstPage = (PDPage)allPages.get( 0 );
stripper.extractRegions( firstPage );
stripper.addRegion( "class1", rectangle );
System.out.println( "Text in the area:" + rectangle );
System.out.println( "Text: " + stripper.getTextForRegion( "class1" ) );

Here rectange is object of Rectangle class of java.awt package. http://docs.oracle.com/javase/7/docs/api/java/awt/Rectangle.html

Rectangle rectange = new Rectangle(int x, int y, int width, int height);