Question

I'm trying to diagnose a memory allocation error thrown by ibv_reg_mr() in software that I use, and my suspicion is that it's related to known problems with some Mellanox Infiniband cards where the default maximum memory that can be registered is about 2GB (see FAQ #18 here http://www.open-mpi.org/faq/?category=openfabrics ).

I would like to be able to confirm unequivocally whether this is the case or not so I can quickly negotiate a solution with my system administrators. Being unfamiliar with RDMA and Infiniband, would someone possibly be able to suggest either (a) a simple program that could register arbitrary amounts of memory such that I may trigger the error at the maximum allowed value, or (b) suggest a way that I may determine the way Infiniband is currently configured considering that I do not have root access?

Thanks everyone!

Jason

Was it helpful?

Solution

You can read the parameters for the Mellanox InfiniBand HCA drivers from sysfs and you don't need root access to do so. The parameters for module <modname> are found in /sys/module/<modname>/parameters/. Each parameter is exposed as a text pseudofile there and its value can be read by simply reading the content of the file. You can even do that using standard Unix command line tools.

For the mlx4_core module the maximum amount of registrable memory is determined using the following formula:

max_reg = (1 << log_num_mtt) * (1 << log_mtts_per_seg) * PAGE_SIZE

For the ib_mthca module the formula is:

max_reg = (num_mtt - fmr_reserved_mtts) * (1 << log_mtts_per_seg) * PAGE_SIZE

where:

  • num_mtt is the maximum number of memory translation table (MTT) segments per HCA;
  • log_num_mtt is the binary logarithm of num_mtt;
  • fmr_reserved_mtts is the number of MTT segments, reserved for FMR;
  • log_mtts_per_seg is the binary logarithm of the number of MTT entries per segment.
  • PAGE_SIZE is the system page size, usually 4 KiB on most current platforms.

Each of these parameters (except PAGE_SIZE) can be read from its corresponding module directory in sysfs.

It is possible that both modules are loaded. In this case just do what Open MPI does: look for mlx4_core first and ib_mthca second.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top