managing 2000+ resulting rows from a "select query" included in a DataService created in WSO2 DSS

Question

Well, first up, I'm not quite sure what testing scenarios you've used that failed DSS when retrieving 2000+ records. Because, generally, if you're only using one particular service to test this scenario, and if enough memory is configured (recommended for production use - 2GB minimum) for the DSS instance, retrieving 2000 records (say record size of probably 10KB?) per data service call, I would say, shouldn't be a heavy load unless you're using very high concurrency etc. However, I'm aware that this is very well subjected to size of the records, concurrency, presence of multiple services serving requests, etc.

Anyway, generally, the ideal way to tackle this problem would be to paginate the result set based on a criteria of your preference. For example, if you need to extract 200 records per data service invocation, you would ideally need to configure the data service query that gets called underneath to paginate data for you. In other words, if you're using a RDBMS data source, you can probably use keywords such as LIMIT, etc (or an equivalent if that keyword is not supported by the RDBMS you're using) in your query configuration to limit the data returned per data service invocation.

Another concept in DSS, that helps you retrieve large chunks of data efficiently is, "streaming". Obviously, we do XML processing to convert a particular result set before sending it as a web service response. Therefore, in the presence of a large result set, it is likely that some amount of memory would be consumed while building up the XML response in memory. And perhaps the size of the final response would be twice as the size of the original data set returned as a lot of XML would be added to wrap the result set before sending as the web service response, depending on the query configuration etc. However, the functionality introduced as "streaming" would make sure that the complete result set is not built completely at the server side before sending the response but the records are streamed to the client side as they are made available. That would help you avoid, data getting piled up at the server side which can possibly cause OOM issues, etc specially when in the presence of a large data set. The only glitch with this mechanism is that, it is hard to handle errors as there's no control for us over the process of data streaming.

Cheers, Prabath