Question

I sincerely apologize if this isn't the proper forum to discuss this, but I wasn't sure where to go or what would be the best option.

Basically, I'm trying to find a database friendly list of veteran affairs hospitals. The closest thing that I've been able to find is www.va.gov/ofcadmin/docs/CATB.pdf as it has all the information I'm looking for:

  • Region
  • Address
  • City in a separate column
  • Zip Code in a separate column
  • State
  • Facility # (also known as StationID)
  • VISN
  • Symbol

I've tried exporting that PDF out into CSV but it's a complete nightmare to get working. So, I was curious if anyone had any ideas or insights into how I could accomplish this task.

Was it helpful?

Solution

First, here's a CSV file containing the data found in CATB.pdf. The very first line contains the column headers, and the rest of the file contains the contents.

http://tmp.alexloney.com/CATB.csv

Now, for the more detailed explanation...I took the PDF you provided a link to, converted it to an HTML document using Adobe Acrobat, then I used a lot of Regular Expressions to parse the file and clean it up. Once the file was cleaned up enough, I was able to write a program to parse through the remainder of the file, grab the state and region, and spit it all out in a nicely formatted CSV.

Hope that helps you!

OTHER TIPS

I believe that PDFILL has an option in it that will convert a PDF file to Excell. Once in Excell you should have no problem converting to a CSV file.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top