Domanda

Whenever I copy tabularly-organized data from a PDF or website, it usually shows up as space-delimited values in Excel. When pasting space-delimited values into Excel, they all go into the same column. This isn't helpful when I want to make a table out of the data, sort the data, or basically perform any operation on the data. Here's an example of a line of data:

100 04/01 10:00 PM 555-555-5555 SOME PLACE,CA XA/VE/AE 1:00

This data is tabularly-organized in the PDF though, meaning no matter what the length of the Location field, the values in the next column are all aligned. For example, this is how it would look in the PDF:

100 04/01 10:00 PM 555-555-5555 SOME PLACE,CA         XA/AE    1:00
100 04/01 10:00 PM 555-555-5555 SOME OTHER PLACE,CA   XA/VE/AE 1:00

Is there a quick way to get way to get the space-delimited values so often seen into tab-delimited values for easy pasting into Excel? Is there perhaps some formatting that I'm missing when copying that's turning all tabs to space? Is there perhaps a program that can extract the tabs that seem to be lost when copying?

È stato utile?

Soluzione

Patrick,

There are no tabs in (99%) PDF files. This are so many ways to draw tabbed text in to a PDF while which makes it impossible to reliably convert text to columns.

Most text in a PDF file is drawn using either

MoveTo(x, y); DrawText("Hello"); MoveTo(x1, t1); DrawText("Hello World");

Some PDF will draw each character individually this way. Others will draw "Hello W" and "orld" as two separate words. Characters can have character spacing and word spacing values applied as well as scaling, rotation etc. I have seen all sorts of ways of drawing text which is why you need a solid PDF to Text library to get the best results. Proportianal fonts will also cause you lots of issues.

ie. There is no quick way to get the columns from your text without writing some fancy code.

Andrew.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top