Does the distance between a program's stack and data segment have an effect on CPU caching?

Question 1

it would seem better to flip things around and put the upward growing part (text, global data, and heap) on top?

No. Cache lines are typically only 32 bytes to 256 bytes. It's rare for a program to use less than a few megabytes of data, so the sharing is basically irrelevant. (Even if you're not using it, the standard library does a lot on your behalf.)

What is relevant is making sure data that is used together is stays close together in memory (and in some cases, aligned with a cache line.)

In a scripting language, each element of the array is likely to be on it's own cache line. But in C, you can put things close together (using arrays or structs). When it comes to number manipulation, rewriting in C can easily get 100x faster. (Or an efficient library like NumPy)

Most processors have separate instruction and data caches too.

Does it matter at all if the data segment is in a completely different part of memory from the stack?

Again, if your stack is likely to be more than a few hundred bytes deep (it will be!), then it's not even relevant. In fact, your stack will probably use many cache lines in a non-trivial program.

If you want to know more, I recommend trying to read What every programmer should know about memory by Ulrich Drepper. It's quite a tome, but even if you skim it, you can get some neat info. (Like making a program run 20x faster just by swiching your loop indices, or the fact that RAM is no more "random access" than your hard drive is.)

Question 2

The distance between the program and the cache is not necessarily the issue. Some portion of the address is used by the cache to determine what and where it goes in the cache.

Also understand that when running on an operating system, Linux, Windows, etc you are likely purely running on virtual memory addresses, so it is only an illusion that the memory is separated. One small chunk of your program that is next to another small chunk could be very far apart in the physical address space, or not.

Then it is a question of where the cache is, is it on the virtual side of the mmu or physical side.

So the short answer is that your cache and program and heap may or may not all be colliding with each other, and there are simple things that can be done (if you have control over the address space used by the cache) to make the cache better for you or worse for you. You can for example "stripe" your data at even boundaries causing collisions on a small portion of the cache, causing excessive amounts of reads (assuming you are not using all of the cache lines) and evictions. Same goes for the program, heavily used functions can happen to be spaced apart just right to cause more cache collisions, or spaced such that they cause fewer collisions with each other. decreasing or increasing your overall performance.

If you have control of the system (and know what address bits are used by the cache), it is not hard to write a program to demonstrate these issues, increasing or decreasing the performance of the same code by simply moving the base addresses where the data and or program functions live, but not changing the program or data otherwise. The bigger the delta between the cache ram and the memory behind it the easier is to see.