Assuming you're talking about something *NIX-ish, there's probably a page cache, whose job is precisely to cache this sort of data to get this speedup. Unless something else came along between calls to evict those pages from the cache, they'll still be there.
So, the first call potentially has to:
- allocate pages
- map the pages into your process address space
- copy the data from those pages into your vector (possibly faulting the data from disk as it goes)
the second call probably finds the pages still in the cache, and only has to:
- map the pages into your process address space
- copy the data from those pages into your vector (they're pre-faulted this time, so it's a simple memory operation)
In fact, I've skipped a step: the open/fstat step in your comment is probably also accelerated, via the inode cache.