My approach for this is to start with simple queries and then collect more and more complex "samples". Yes, I'm using the actual query result and copy that into the expected
part of my unit tests after eyeballing it.
This isn't a great test; the purpose is slightly different: It documents what you expect.
When something changes, your tests will start to fail and you can create a diff between "what I expected yesterday" and "what I get today."
This also documents what your code is capable of. When a bug in production is discovered, you can try to write a new unit test and see how many others are also failing. This gives you good understanding of the impact of a change (1 change + 90% of the tests fail? You found a hot spot!)
When I fix bugs, I start to see patterns what usually goes wrong and I can protect those places with specific unit tests (like in sub builders used by the big query builder).
For a few complex queries, I create unit tests to see if they would actually work. That gives me the confidence that the quick "string compare" unit tests work.