Question

I'm attempting to build a winforms application that can do the following:

  1. Take in a PDF file
  2. Extract data (based on some sort of template or configuration file)
  3. Build data tables
  4. Serialize and upload the data tables to a web service

As of right now I have the PDF file converted into a text string, but I am having trouble coming up with a format for the template. At first I tried making my own XML custom configuration files- while this would satisfy the requirements of the project, I am finding it extremely difficult to express the necessary instructions in a way that is general enough. First I tried processing the text line by line and using a series of flags for various instructions. This concept seemed like it would work until I realized that often the data tables spanned multiple pages with extraneous text in-between. My initial processing attempt went like this:

  1. Load first instruction (start flag,end flag, action (eg. create table), and table structure)
  2. When End Flag is reached load next instruction

Unfortunately this doesn't account for looping or offer enough control over the way this all works. In some cases I need to get information that is appended to every row of data. I worked out how to do this using queued instructions then going back and processing them again when the rest of the table is built. The looping issue still remains though since each table is named based on the instruction.

Now I am looking into VTL and trying to see if a project like Vici would help me. It is getting to the point where I'm creating a psuedo-scripting language just to accomplish what I need and it is getting far too difficult.

TLDR VERSION: Are there any libraries or projects that will help me build data tables from plain text using some sort of template or configuration files?

Was it helpful?

Solution

Have you thought of the prospect of NOT using a template or configuration file? What are the advantages of using such a file? Can't you, for example, create an impromptu library and just write the actual processing code in C#? I did the same thing you're doing now, once, and in retrospect, this is what I should have done.

You said it yourself, you're developing some sort of scripting language. That already means code changes. Whatever you use, if the scenario is complex enough, it's bound to end up as code changes or a similar effort. You could pack the processing code separate from the library code, and update the assembly the contains it alone.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top