Mixed AMD and Intel nodes in a cluster... considerations?

Question 1

Intel and AMD processors are at large binary compatible though there are things like difference in cache sizes and instruction scheduling that could result in sub-optimal run of a particular code on AMD if the code was compiled with optimisations for Intel and vice versa. There are some differences in the instruction sets implemented by both vendors but those are usually not very useful in scientific computing anyway.

Since (1) is not a problem, one does not need a workaround. Still one has to keep in mind that some compilers enable by default instruction sets and optimisations for the processor, on which the code is being compiled. Therefore one has to be extra careful with the compiler options when the head node uses CPUs from a different vendor or even from the same vendor but from a different generation. This is especially true for Intel's compiler suite, while GCC is less aggressive by default. On the other hand, one could usually instruct the compiler what architecture to target and optimise for, e.g. by providing the appropriate -mtune=... option to GCC.

As for sharing the file system, it depends on how your data storage is organised. Parallel applications often need to access the same files from all ranks (e.g. configuration files, databases, etc.) and therefore require both home and work file systems to be shared (unless one uses the home file system as working one). Also you might want to share things like /opt (or whatever the location where you store cluster-wide software packages) in order to simplify the cluster administration.

It is hard to point you to a definitive source since there are as many "best practices" as cluster installations around the world. Just stick with a working setup and tune it iteratively until you reach convergence. Installing TORQUE is a good start.

Question 2

I also have the same question. But coming to think of it heterogeneity is the norm. GPU is a different processor architecture compared to a GPU. But during cross-compilation of the program, exact target acrhitecture should be defined. Compiler will create binary exactly for the target architecture.

While compiling for GPU, I have seen compiler flags specifying the right arch options

For example:

/usr/local/cuda/bin/nvcc -ccbin /opt/anaconda3/bin/x86_64-conda_cos6-linux-gnu-gcc -I../../../Common  -m64    --std=c++11 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o deviceQuery.o -c deviceQuery.cpp