Processing CSV files from the Web using embedded Java database

https://stackoverflow.com/questions/383260

23-08-2019
|

Question

Short version: assuming I don't want to keep the data for long, how do I create a database programmaticly in HSQLDB and load some CSV data into it to? My schema will match the files exactly and the files do have adequate column names.

This is an unattended process.

Details:

I need to apply some simple SQL techniques to three CSV files downloaded via the web, then create some DTOs which I can then use with some existing code to process them some more, and save them via REST. I don't really want to mess around with databases but the CSV files are linked by foreign keys, so I was thinking of using an in-memory embedded database to do the work, then throw the whole lot away.

I had in mind a command line app working like this:

Create a fresh database in HSQLDB.
Start three HTTP GETs in three threads using Apache HttpClient.
Import the CSVs into three HSQLDB MEMORY tables.
Run some SQL.
Parse the results into my existing DTOs.
Etc...

I could use pointers to code and utilities helpful for items 1, and 3. Also is there an alternative to HSQLDB I should consider?

Solution

Check opencvs. It helps you to parse CSV files.

OTHER TIPS

The command line app to use is the SqlTool utility jar that is supplied with HSQLDB. Your procedure can be completed as follows:

Create a fresh HSQLDB in-memory database (just connect to the in-memory database).
Start three HTTP GETs using Apache HttpClient to get the CSV files.
Create three HSQLDB TEXT tables and set the SOURCE of these tables to the CSVs
Run some SQL. Parse the results into your existing DTOs.

Creating TEXT tables in pure in-memory tables was not possible when the question was asked. It is now fully supported in HSQLDB 2.x versions.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow