Pregunta

first question I've asked and I'm not sure how to ask it clearly, or if there will be an answer that I want to hear ;)

tl;dr: "I want to import a file into my application at work but I don't know the input format. How can I discover it?"

Forgive any pending wordiness and/or redaction.

In my work I depend on an unsupported (and proprietary) application written in Pascal. I have no experience with pascal (yet...) and naturally have no source code access. It is an excellent (and very secret/NDA sort of deal I think) application that allows us to deal with inventory and financial issues in my employer's organization. It is quite feature-comprehensive, reasonably stable and robust, and kind of foistered (word?) on us by a higher power.

One excellent feature that it has is the ability to load up "schedules" into our corporate system. This feature should be saving us hundreds of hours in data entry. But it isn't. The problem is, the schedules we receive are written in a legacy format intended for human eyes. The "new" system can't interpret them.

Our current information (which I have to read and then re-enter into the database by hand) is send in a sort of rich-text flat-file format, which would be easy to parse with the string library of probably any mainstream language.

So I want to write a converter to convert our data into a format that the new software can interpret.

By feeding certain assorted files into the system, I have learned a little bit about what kind of file it expects:

  1. I "import" a zero-byte file. Nothing happens (same as printing a report with no data)
  2. I "import" an XML file that I guess might look like the system expects. It responds with an exception dialog and a stacktrace. Apparently the string <?xml contains illegal characters or something
  3. I "import" a jpeg image -- similar result to #2.

So I think that my target wants a flat-file itself. The file would need to contain a "document number" along with {entries with "incident IDs" and descriptions and numeric values}. But I don't know this for certain.

Nobody is able to tell me exactly what these files should look like. Someone in the know said that they have seen the feature demonstrated -- somewhere out there is a utility that creates my importable schedules. But for now, the utility is lost and I am on my own.

What methods can I use to figure out the input file format? I know nothing about debugging pascal, but I assume that that is probably my best bet. Or do I have to keep on with brute force until I can afford a million monkey-operated typewriters? Do I have to decompile the target application? I don't know if I can get away with that, let alone read the decompiled source.

My google-fu has failed me. Has anyone done something like this before or could they point me in the right direction? Are there any guides on this subject?

Thanks in advance.


PS: I am certain that I am not breaking any laws at this point, although I will have to check to find out if decompilation would get me into trouble or not, and that might be outside of my technical competence anyway.

¿Fue útil?

Solución

If you have an example file you can try to take a hexdump utility and try to see if there things you can identify. Any additional info that you have (what should in the file) helps with that. Better even, if you know a program that can edit the file, you can use the editor to make minimal changes and then compare the file before and after.

IOW standard tricks of binary file format reverse engineering.

...If you have no existing files whatsoever, then reverse engineering the binary is your only option, and that is not pretty. Decompilation of native binaries is a black art that requires considerable time and skill. Read the various decompilation FAQs on the net.

First and for all, I would try to contact the authors of the program. Source code are options 1,2,3 and you only go with other options if there is really, really, really no hope whatsoever of obtaining source or getting normal support.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top