Question

Under VoltDB Don'ts

http://community.voltdb.com/DosAndDonts

They state

Don't create queries that return large volumes of data (such as SELECT * FROM FOO with no constraints) especially for multi-partition transactions. Be conservative in the data returned by stored procedures.

This is one of the use cases that I have. Is there some aspect of stored procedures that make them unsuitable for this type of query or is it something specific to VoltDB? Under this scenario, would performance degrade to a level that would be worse than a traditional RDBMS such as Postgres?

Edit: My query is not quite a select * from foo but I will need to select all financial transactions between certain date ranges and this could exceed 100m rows

Was it helpful?

Solution

All databases need to pay the materialization and i/o costs to transfer a large result set back to a user.

However, I can speak specifically to VoltDB.

In VoltDB, stored procedures are all transactions. Even a result set that selects a large portion of the database is a fully isolated from other concurrent procedures. The tuples in that result set need to be momentarily buffered internally (for example for cross-partition ordering or limiting) and then returned to the user.

The combination of needing to maintain full isolation over a result that can take many milliseconds (or seconds) of I/O to return to the user and the aggregation that happens at the coordinating node of a multi-partition procedure limits the maximum result set size.

I suspect a future release will address this limitation - many people have data-access requirements similar to what you describe.

OTHER TIPS

The performance issue is that of large data transfers. This is generally true for all databases.

When a database needs to return large amounts of data, it needs to use lots of resources (memory, CPU, network IO for example), degrading performance.

The network IO alone can be an issue, as if it is a significant amount of data, nothing else could go through the network till the data transfer has completed.

If you are querying SELECT * FROM FOO in order to display the results to a user interface, or a similar use case, then paging is a good approach to limit the size of the data returned and the execution time of the transaction. There is a paging example in a.

If you are trying to extract data from VoltDB to export to another database or system, the better approach is use VoltDB's Export.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top