The bottom line here is that for anything other than an embedded document you must make more than one query to the database. So in the references scenario you are finding the "child" in one request and then accessing the parent in other request. The code may hide this a bit, but it is actually two round trips to the database.
So the embedded model will be faster. What might be confusing you at the moment is the lack of an index present on your authenticated_devices.device_token
field within your Profile
model and collection. So with the index in place, then these look-ups are optimal.
It is true the another consideration here could be the cost of pulling the document that contains all of the "devices" in the embedded collection, but as long as the information is reasonably light it still should incur less overhead than an additional trip to the database as it were.
As a final point, if the information you were accessing from Profile
is actually very light, then even though it might be against your sensibilities, the fastest possible way should very likely be to just replicate that information "per device" and use that in a single request rather than reference another document with another request.
So look at your usage patterns and consider the size of the data, but generally as long as you have indexes in place for your usage patterns there should be nothing wrong with the embedded model.