Question

I'm working on a web service at the moment and there is the potential that the returned results could be quite large ( > 5mb).

It's perfectly valid for this set of data to be this large and the web service can be called either sync or async, but I'm wondering what people's thoughts are on the following:

  1. If the connection is lost, the entire resultset will have to be regenerated and sent again. Is there any way I can do any sort of "resume" if the connection is lost or reset?

  2. Is sending a result set this large even appropriate? Would it be better to implement some sort of "paging" where the resultset is generated and stored on the server and the client can then download chunks of the resultset in smaller amounts and re-assemble the set at their end?

Was it helpful?

Solution

I have seen all three approaches, paged, store and retrieve, and massive push.

I think the solution to your problem depends to some extent on why your result set is so large and how it is generated. Do your results grow over time, are they calculated all at once and then pushed, do you want to stream them back as soon as you have them?

Paging Approach

In my experience, using a paging approach is appropriate when the client needs quick access to reasonably sized chunks of the result set similar to pages in search results. Considerations here are overall chattiness of your protocol, caching of the entire result set between client page requests, and/or the processing time it takes to generate a page of results.

Store and retrieve

Store and retrieve is useful when the results are not random access and the result set grows in size as the query is processed. Issues to consider here are complexity for clients and if you can provide the user with partial results or if you need to calculate all results before returning anything to the client (think sorting of results from distributed search engines).

Massive Push

The massive push approach is almost certainly flawed. Even if the client needs all of the information and it needs to be pushed in a monolithic result set, I would recommend taking the approach of WS-ReliableMessaging (either directly or through your own simplified version) and chunking your results. By doing this you

  1. ensure that the pieces reach the client
  2. can discard the chunk as soon as you get a receipt from the client
  3. can reduce the possible issues with memory consumption from having to retain 5MB of XML, DOM, or whatever in memory (assuming that you aren't processing the results in a streaming manner) on the server and client sides.

Like others have said though, don't do anything until you know your result set size, how it is generated, and overall performance to be actual issues.

OTHER TIPS

There's no hard law against 5 Mb as a result set size. Over 400 Mb can be hard to send.

You'll automatically get async handlers (since you're using .net)

implement some sort of "paging" where the resultset is generated and stored on the server and the client can then download chunks of the resultset in smaller amounts and re-assemble the set at their end

That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.

Similarly --

entire resultset will have to be regenerated and sent again

If it's MS-SQL, for example that is generating most of the resultset -- then re-generating it will take advantage of some implicit cacheing in SQL Server and the subsequent generations will be quicker.

To some extent you can get away with not worrying about these problems, until they surface as 'real' problems -- because the platform(s) you're using take care of a lot of the performance bottlenecks for you.

I somewhat disagree with secretGeek's comment:

That's already happening for you -- it's called tcp/ip ;-) Re-implementing that could be overkill.

There are times when you may want to do just this, but really only from a UI perspective. If you implement some way to either stream the data to the client (via something like a pushlets mechanism), or chunk it into pages as you suggest, you can then load some really small subset on the client and then slowly build up the UI with the full amount of data.

This makes for a slicker, speedier UI (from the user's perspective), but you have to evaluate if the extra effort will be worthwhile... because I don't think it will be an insignificant amount of work.

So it sounds like you'd be interested in a solution that adds 'starting record number' and 'final record number' parameter to your web method. (or 'page number' and 'results per page')

This shouldn't be too hard if the backing store is sql server (or even mysql) as they have built in support for row numbering.

Despite this you should be able to avoid doing any session management on the server, avoid any explicit caching of the result set, and just rely on the backing store's caching to keep your life simple.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top