Volatile fields: How can I actually get the latest written value to a field?

Question 1

Your understanding is correct, and it is true that you cannot ensure that the program will always print 1 using these techniques. To ensure your program will print 1, assuming thread 2 runs after thread one, you need two fences on each thread.

The easiest way to achieve that is using the lock keyword:

private int sharedState = 0;
private readonly object locker = new object();

private void FirstThread() 
{
    lock (locker)
    {
        sharedState = 1;
    }
}

private void SecondThread() 
{
    int sharedStateSnapshot;
    lock (locker)
    {
        sharedStateSnapshot = sharedState;
    }
    Console.WriteLine(sharedStateSnapshot);
}

I'd like to quote Eric Lippert:

Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place.

The same applies to calling Volatile.Read and Volatile.Write. In fact, they are even worse than volatile fields, since they require you to do manually what the volatile modifier does automatically.

Question 2

You're right, there's no guarantee that release stores will be immediately visible to all processors. Volatile.Read and Volatile.Write give you acquire/release semantics, but no immediacy guarantees.

The volatile modifier seems to do this though. The compiler will emit an OpCodes.Volatile IL instruction, and the jitter will tell the processor not to store the variable on any of its registers (see Hans Passant's answer).

But why do you need it to be immediate anyway? What if your SecondThread happens to run a couple of milliseconds sooner, before the values are actually wrote? Seeing as the scheduling is non-deterministic, the correctness of your program shouldn't depend on this "immediacy" anyway.

Question 3

Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.

As you go on to explain yourself, this impression is wrong. Volatile.Read simply issues a read operation on its target followed by a memory barrier; the memory barrier prevents operation reordering on the processor executing the current thread but this does not help here because

There are no operations to reorder (just the single read or write in each thread).
The race condition across your threads means that even if the no-reorder guarantee applied across processors, it would simply mean that the order of operations which you cannot predict anyway would be preserved.

If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.

That is correct. In essence you are using a tool designed to help with weak memory models against a possible problem caused by a race condition. The tool won't help you because that's not what it does.

If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)

To stress once again: the memory model is not the problem here. To ensure that your program will always print 1 you need to do two things:

Provide explicit thread synchronization that guarantees the write will happen before the read (in the simplest case, SecondThread can use a spin lock on a flag which FirstThread uses to signal it's done).
Ensure that SecondThread will not read a stale value. You can do this trivially by marking sharedState as volatile -- while this keyword has deservedly gotten much flak, it was designed explicitly for such use cases.

So in the simplest case you could for example have:

private volatile int sharedState = 0;
private volatile bool spinLock = false;

private void FirstThread()
{
    sharedState = 1;
    // ensure lock is released after the shared state write!
    Volatile.Write(ref spinLock, true); 
}

private void SecondThread()
{
    SpinWait.SpinUntil(() => spinLock);
    Console.WriteLine(sharedState);
}

Assuming no other writes to the two fields, this program is guaranteed to output nothing other than 1.