Which things are necessary when sharing memory between different processors?

Question 1

Apart from Endianness, following might be important,

Implementation of standard api's might differ and can have adverse effects on shared data e.g. memcpy() on ARM is implemented in a unique way internally and i at least remember having a hard time with a bug porting an RTOS to ARM. I don't recall the exact details but those surely can be found with a little search.
Strictly use types and structures aligned by the tool chains. Because alignment and padding for each architecture might be totally different. So you would be in for a surprise if you try indexing values inside a struct using pointer / array indexing.

Question 2

Besides the issues in Fayyazki's answer there are several additional hurdles to the arrangement you describe.

Synchronisation: In all likelihood, you will need a pair of doorbell interrupts so that the CPU and DSP can interrupt each other to notify each other of communication. The alternative is polling - which is unlikely to result in high throughput.
Atomicity: Writing multi-word structures (e.g. C++ classes) to shared memory is problematic as you have none of the synchronisation mechanisms you might use in other circumstances such as bus-locking or disabling interrupts. You could use spin-locks to control read and write access if you insist on writing multiword structures to the shared memory. Controlling the memory packing of your structures may in fact result in more multi-word accesses.
Concurrent memory access - if you are using true multi-port SRAM, concurrent writes on the same address will be non-predictable. You therefore don't do this.
Memory/instruction order barriers: You will need to use memory barriers to ensure that writes are observed on the memory bus in the order you expect (many Cortex A-series ARM CPUs have out-of-order execution and store behaviour). This is in addition to either making the address range of the memory uncachable or flushing the cache.

By far the easiest approach to implementing communications between the two processors is to use a ring buffer in which the writing head and tail pointers are the natural word-size of the memory and are maintained by the writing processor and reading processor respectively. You will need barrier (both memory and instruction order) after writing data to the buffer and updating the head pointer to ensure writes into the buffer are observed by memory before updating the head pointer. Reads work in reverse.

Question 3

ARM's weakly-ordered memory model means you'll likely have to be very careful with synchronisation and coherency if you're using cacheable memory - if this is the case I'd recommend a thorough read of the brain-melting memory model chapters of the ARM ARM (and possibly the barriers appendix).

Also, if the two processors have different views of the same memory (à la Raspberry Pi), then having pointers in the shared data could be fun*...

_{* for some given value of "fun"}