These instructions are hints used to suggest that the CPU try to prefetch a cache line into the cache. Because they're hints, a CPU can ignore them completely.
If the CPU does support them, then the CPU will try to prefetch but will give up (and won't prefetch) if a TLB miss would be involved. This is where most people get it wrong (e.g. fail to do "preloading", where you insert a dummy read to force a TLB load so that prefetching isn't prevented from working).
The amount of data prefetched is 32 bytes or more, depending on CPU, etc. You can use CPUID to determine the actual size (CPUID function 0x00000004, the "System Coherency Line Size" returned in EBX bits 0 to 31).
If you prefetch too late it doesn't help, and if you prefetch too early the data can be evicted from the cache before it's used (which also doesn't help). There's an appendix in Intel's "IA-32 Intel Architecture Optimisation Reference Manual" that describes how to calculate when to prefetch, called "Mathematics of Prefetch Scheduling Distance" that you should probably read.
Also don't forget that prefetching can decrease performance (e.g. cause data that is needed to be evicted to make room) and that if you don't prefetch anything the CPU has a hardware prefetcher that will probably do it for you anyway. You should probably also read about how this hardware prefetcher works (and when it doesn't). For example, for sequential reads (e.g. memcmp()
) the hardware prefetcher does it for you and using explicit prefetches is mostly a waste of time. It's probably only worth bothering with explicit prefetches for "random" (non-sequential) accesses that the CPU's hardware prefetcher can't/won't predict.