Question

I have used Utf8Json a lot (it is very good) but have since adapted some lower level code and started using Utf8JsonReader directly.

Looking into the code of the Utf8Json library, I see JsonSerializer.DeserializeAsync as ultimately it is using the System.IO Stream class which itself has Stream.ReadAsync functions.

Looking at Utf8JsonReader it uses ReadOnlySequence<byte and does not have any async functions.'

Looking here https://github.com/dotnet/runtime/issues/29906 it mentions:

Utf8JsonReader is re-entrant and so an asynchronous wrapper around Utf8JsonReader that contains additional state can shell out to a fully synchronous helper function that is able to create the needed types and perform the streaming read.

My questions are please (and thanks for the time to read):

  1. why would one library access memory buffers asynchronously (Utf8Json), and another synchronously (Utf8JsonReader)?

  2. I understand async makes sense for long standing IO, like on network ports or files on disk, but on memory that overhead would be detrimental? (is this why no async functions on ReadOnlySequence<byte>?)

  3. What does the author on the github site mean by creating an async wrapper, and what would this look like?

Was it helpful?

Solution

I'm the "op" that's quoted from the GitHub issue; when the System.Text.Json API was first previewed, it did not have any functions to asynchronously consume a stream, and so I needed to write my own wrapper around it to add that functionality.

An async method cannot use a number of stack-dependent features such as out parameters and stack-allocated variables (Span and co) because an async method may yield at some point between the allocation and the end of the scope. But an async method is free to call out to a regular non-async method that can use these features freely, since the stack is guaranteed to remain in-place for the duration of the execution of the helper function.

For my needs, I first created a StreamSequence which allowed for consuming a Stream asynchronously but feeding its contents to the synchronous-only Utf8JsonReader in a high-performance fashion by loading a chunk at a time as a ReadOnlySequenceSegment that could be pieced together into a single ReadOnlySequence. Theoretically as the JSON was consumed, you could go reclaim memory pages at the start of the ReadOnlySequence by dropping earlier ReadOnlySequenceSegment instances, but this not implemented.

Then I was able to use this in my state machine that parsed a stream, with the core async API function using the StreamSequence to load a chunk of the JSON content from the stream at a time and passing it to a dedicated (private) non-async function that was responsible for parsing the contents of the ReadOnlySequence<byte> as JSON and serializing it to the type I needed. This is the code ripped out of the implementation, I never got around to making it more general or cleaning it up because life got in the way. (I thought about doing it for this reply, but then it might be a month or two before I got around to posting, so this is probably better.)

This is all based of STJ preview 6, the API was changed slightly in preview7 to its current state, perhaps partially due to the feedback I filed at the time regarding how confusing I found the exposed internal state (#29906 and #29911), but the concepts are still the same. You can even install the preview6 version of STJ to play around with this code and then after you get it working the way you want you can upgrade to the latest release and fix the breakage. In all cases, I think it should be apparent what creating a synchronous helper method to shell out work from an asynchronous entry point looks like, and how you can then freely use Span and co in that synchronous context.

Note that the current version of the STJ API actually provides a DeserializeAsync<T> method that I've actually just replaced all my code with because my need for custom deserializing didn't outweigh the benefits (there were no asynchronous STJ methods at the time I wrote this), but if you need to manually "dive in" to the JSON stream and interact with Utf8JsonReader directly, you're going to need something along these lines.

(Note that there may be a in-box alternative to my StreamSequence by now, I don't know. Perhaps someone more in-tune with the rapidly changing ASP.NET Core APIs can comment on that.)


Update on usage: you create a StreamSequence around a Stream instance, which does nothing but tie them together. StreamSequence.ReadMoreAsync() asynchronously consumes bytes from the underlying stream without blocking the threadpool. The property StreamSequence.Sequence exposes a ReadOnlySequence<byte> that represents the contents of the stream until now. Each subsequent call to ReadMoreAsync() extends the ReadOnlySequence<byte> with another ReadOnlySequenceSegment making more data available. You would instantiate the StreamSequence, read once, then pass the StreamSequence.Sequence to your parser helper. If the parser needs more data to continue its operation, it should return a flag letting you know that, bubbled up to your async entry point which is free to then call await ReadMoreAsync() then call the parser again to continue its job (passing in any state information as need).

Licensed under: CC-BY-SA with attribution
scroll top