Filtering / Sorting huge data in Java [closed]

https://stackoverflow.com/questions/13971408

11-12-2021
|

문제

Let me first brief about the scenario. The database is Sybase. There are some 2-3 k stored procedures. Stored procedures might return huge data (around million records). There will be a service (servlet / spring controller) which will call the required procedure and flush the data back to client in xml format.

I need to appy filtering (on multiple column & multiple condition) / sort (based on some dynamic criteria), this I have done.

The issue is, as the data is huge, doing all the filtering / sorting in-memory is not good. I have thought of below options.

Option 1: Once I get the ResultSet object, read some X no. of records, filter it, store it in some file, repeat this process till all the data is read. Then just read the file and flush the data to client.

I need to figure out how do I sort the data in file and how to store objects in file so that the filtering/sorting is fast.

Option 2: Look for some Java API, which takes the data, filters it & sort it based on the given criteria and returns it back as a stream

Option 3: Use in-memory database like hsqldb, h2database, But I think this will overhead instead of helping. I will need to insert data first and then query data and this will also in turn use the file system.

Note I don't want to modify the stored procedures so the option of doing filtering/sorting in database is not an option or might be the last option if nothing else works.

Also if it helps, every record that I read from ResultSet, I store it in a Map, with keys being the column name and this Map is stored in a List, on which I apply the filtering & sorting.

Which option do you think will be good for memory footprint, scalable, performance wise or any other option which will be good for this scenario ?

Thanks

해결책

I would recommend your Option 3 but it doesn't need to be an in-memory database; you could use a proper database instead. Any other option would be just a more specific solution to the general problem of sorting huge amounts of data. That is, after all, exactly what a database is for and it does it very well.

If you really believe your Option 3 is not a good solution then you could implement a sort/merge solution. Gather your Maps as you already do but whenever you reach a limit of records (say 10,000 perhaps) sort them, write them to disk and clear them down from memory.

Once your data is complete you can now open all files you wrote and perform a merge on them.

다른 팁

Is hadoop applicable for your problem?

You should filter the data in database itself. You can write aggregation procedure which will execute all other procedures, combine data or filter them However the best option is to modify 2-3 thousands stored procedures so they return only needed data.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow