Question

Is it sensible/reasonable to consider using an SQLite database file as a data interchange format? Are there any obvious gothchas or reasons not to do this?

Example. Say I have a master SQL Server database and periodically need to SELECT a set of rows from a table, export them into a file, and transfer the file to a remote system where the data will be imported into a remote/slave database (it could be SQL Server or some other database technology entirely, e.g. SQLite, MySQL, etc.). I.e. the file format should be general purpose and not be tied to a specific database technology.

Some pros are:

  • No encoding issues to worry about. This is all covered by the SQLite db file format, including things like endianness of binary data.
  • Compact encoding of binary data. E.g. avoids text encoding of numeric data, or base64 encoding of binary objects.
  • Well defined data schema. The data will be in a table with well defined column types.

Cons:

  • Not human readable/editable, e.g. if a tester wanted to generate or edit a data file, they will need additional knowledge and tools. If this were e.g. a csv, json or XML file, anyone can immediately see and understand the structure of the data in a text editor, and can directly edit it.
  • Somewhat non-standard. Requires additional dependency on a SQLite library/dll.
  • One more table schema to maintain, i.e. in addition to the master and remote database tables, there is now a third table.

Is this idea at all sensible?

Was it helpful?

Solution

It happens to be possible to use SQLite databases as a data interchange format, but it's not a particularly good solution. Why?

  • You are looking for a compact data exchange format. In contrast, SQLite's goal is to store data in a manner that is easily queried and modified on disk. This is not necessarily a compact format, e.g. if the database contains free pages.

  • By using a not-human readable format where data can only be accessed through the SQLite interface, you are sacrificing quite a lot debuggability. On the plus side, SQLite files, XML, and JSON are all self-describing, with SQLite and optionally XML also containing their data schema within the document.

  • SQLite happens to have a good compatibility story, but different versions have different capabilities. You will have to make sure that your generated database files are understood by all supported SQLite versions, which may require extra configuration and testing.

  • By using SQLite, you are constrained to SQLite's data model. This means everything fits into rows and columns, and is dynamically typed. Other data formats have more flexible data models that are better suited to arbitrary object graphs. Particularly XML also has excellent support for defining and checking data schemas.

    As you are taking data out of a database, the data schema should roughly fit, but note that database-specific extensions might be problematic. It's also certainly not the case that any data in a SQLite database would be fit to import into another DB. Since you will necessarily have to write extra tooling for export/import, choosing SQLite over some other file format does not represent a shortcut that will save you a noticeable amount of work.

In most scenarios, using some established data interchange format is vastly preferable, which often means XML or JSON, or possibly a binary format such as Protobuf. Unless you transfer mostly binary data or data that can be defined with simple structs, the choice doesn't matter very much – just base64-encode any binary data before putting it into a textual format, which only costs you 33% overhead.

CSV might be an option for tabular data, but most CSV implementations are underspecified and disagree on details like how delimiters and newlines get to be escaped – so take extra care to configure that properly.

If you need to combine multiple documents into one file, a common strategy is to use a ZIP archive as an envelope. Those are easy to debug using common tools, and easy to manipulate programmatically. E.g. the “ZIP of XML” strategy is used by the .docx and .odt file formats.

Please also consider whether manually moving data between two databases is the correct solution for your needs. Most DBs have builtin support for a master + readonly slave setup. That might be a lot easier. If you move data manually, you will also have to think about data consistency issues. Are you transferring a selection of records, or are you bringing the complete remote DB up to date? Are you transferring records, or a log of changes that were applied to these records since the last sync? How are you making sure the transferred data results in a consistent view, e.g. with respect to constraints and foreign key relationships?

So to summarize: You have a difficult problem. Using SQLite may be possible, but it won't make this problem any easier. Prefer established solutions.

OTHER TIPS

In these days of GDPR and its potential, punitive fines for any and all Data Breaches, delivering anything via "Sneaker-Net" seems like the wrong direction to be going in.

Why can't the "remote system" access the master SqlServer database (or a REST API that sits in front of it) and pull the data directly?

Licensed under: CC-BY-SA with attribution
scroll top