Question

My application needs to use data in a text file which is up to 5 GB in size. I cannot load all of this data into RAM as it is far too large.

The data is stored like a table, 5 million records (rows) and 40 columns each containing text that will be converted in memory to either string, ints, or doubles.

I've tried caching only 10 - 100 MB of data in memory and reloading from the file when I need data outside but it is way too slow! When I run calculations because I can randomly jump from any row within the table it would constantly need to open the file, read and close.

I need something fast, I was thinking of using some sort of DB. I know calculations with large data like this may take a while which is fine. If I do use a DB it needs to be setup on launch of the desktop application and not require some sort of server component to be installed before.

Any tips? Thanks

Was it helpful?

Solution

I think you need to clarify some things:

  • This is desktop application (I assume yes), what is the memory limit for it?
  • Do you use your file in read-only mode?
  • What kind of calculations are you trying to do? (how often random rows are accessed, how often consequent rows are read, do you need to modify data)

Currently I see two ways for further investigation:

  • Use SQLite. This is small single-file DB, oriented mainly for desktop applications and single-user use. It's doesn't require any server, all you need is to have appropriate jdbc library.
  • Create some kind of index, using, for example, binary tree. First time you read your file, index the start position of the rows within the file. In conjunction with permanently open random access file this will help you to seek and read quickly desired row. For binary tree, your index may be approximately 120M. (it's RowsCount * 2 * IndexValueSize for binary tree)

OTHER TIPS

You can use an embedded database, you can find a comparison here: Java Embedded Databases Comparison.

Or, depending on your use case you may even try to use Lucene which is a full text search engine.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top