From my experience testing services that rely heavily on data, here are some suggestions:
Absolutely use real data, not test data. I think you've already mentioned it can be very painful to hand mock data since it changes often and can be complex/large.
Be able to re-generate the real data quickly. It's not too hard to write a tool to take a one time dump of data and use that, but ideally when the data models change, you can easily take another snapshot of data and use that.
Avoid complex code. You've covered this already. No one will keep up the quality of the tests if the complexity is too large.
The approach I've settled on for the time being that I like is the following:
Make sure all database calls/external dependencies sit behind interfaces. This allows you to mock data. Seems like you have this down.
Create a special implementation of an interface that does two things: a) Makes a live database call, b) Saves the result of the database call to a JSON file.
When you need to create a snapshot of real data, you swap out your production implementation of a given interface for the one from (2). Then you go through a few use cases and JSON files get written to disk.
Your test start up code simply needs to read JSON from disk and hydrate objects.
This has worked really well for me as it allows for quick copies of real data and it is very easy to update when the data changes.
There's obviously some work involved, but the code required is not terribly complex.