Design pattern for parsing binary file data and storing in a database
Does anybody recommend a design pattern for taking a binary data file, parsing parts of it into objects and storing the resultant data into a database?
I think a similar pattern could be used for taking an XML or tab-delimited file and parse it into their representative objects.
A common data structure would include:
(Header) (DataElement1) (DataElement1SubData1) (DataElement1SubData2)(DataElement2) (DataElement2SubData1) (DataElement2SubData2) (EOF)
I think a good design would include a way to change out the parsing definition based on the file type or some defined metadata included in the header. So a Factory Pattern would be part of the overall design for the Parser part.
- Just write your file parser, using whatever techniques come to mind
- Write lots of unit tests for it to make sure all your edge cases are covered
Once you've done this, you will actually have a reasonable idea of the problem/solution.
Right now you just have theories floating around in your head, most of which will turn out to be misguided.
Step 3: Refactor mercilessly. Your aim should be to delete about half of your code
You'll find that your code at the end will either resemble an existing design pattern, or you'll have created a new one. You'll then be qualified to answer this question :-)
I fully agree with Orion Edwards, and it is usually the way I approach the problem; but lately I've been starting to see some patterns(!) to the madness.
For streaming data, the entire parser would look something like an adapter, adapting from a stream object to an object stream (which usually is just a queue).
For your example there would probably be one builder for the complete data structure (from head to EOF) which internally uses builders for the internal data elements (fed by the interpreter). Once the EOF is encountered an object would be emitted.
However, objects created in a switch statement in some factory function is probably the simplest way for many lesser tasks. Also, I like keeping my data-objects immutable as you never know when someone shoves concurrency down your throat :)
The Strategy pattern is maybe one you want to look at. The strategy being the file parsing algorithm.
Then you want a separate strategy for database insertion.
Use Lex and YACC. Unless you devote the next ten years exclusively to this subject, they will produce better and faster code every time.