Posted on 2012/06/14 12:53
Filed Under 클러스터란/고성능연산_HPC 조회수: view 4528


On some Linux machines, Infiniband libraries are installed (for example with OpenMPI) without the corresponding kernel drivers and/or hardware. This could cause a CFD-ACE+ parallel run to stop with error messages related to Infiniband. The error messages maybe any of the following or something similar depending on your configuration:  

libibverbs: Fatal: Couldn't read uverbs ABI version
CFD-ACE-SOLVER-MPM-MPI: Rank 0:0: MPI_Init: didn't find active interface/port
CFD-ACE-SOLVER-MPM-MPI: Rank 0:0: MPI_Init: Can't initialize RDMA device
CFD-ACE-SOLVER-MPM-MPI: Rank 0:0: MPI_Init: MPI BUG: Cannot initialize RDMA protoc



In such cases, one has to force HP-MPI to use TCP connections by setting a new environment variable. The MPI_IC_ORDER environment variable can be used to force HP-MPI to ignore all other interconnects except TCP.

Variable name: MPI_IC_ORDER                                                                                                                                Variable value: TCP

For bash/sh/ksh: export MPI_IC_ORDER="TCP"
For csh/tcsh: setenv MPI_IC_ORDER "TCP"



This needs to be set only on the master node.  

MPI_IC_ORDER is an environment variable whose default contents are:    

ibv:vapi:udapl:psm:mx:gm:elan:itapi:TCP"  

It instructs HP-MPI to search in a specific order for the presence of an interconnect. Lowercase selections imply ‘use if detected, otherwise keep searching’. An uppercase option demands that the interconnect option be used, and if it cannot be selected the application will terminate with an error. This can be used to set a different interconnect if available.
Writer profile
author image
-아랑 -
2012/06/14 12:53 2012/06/14 12:53

트랙백 주소 : 이 글에는 트랙백을 보낼 수 없습니다

About

by 서진우
Twitter :@muchunalang

Counter

• Total
: 4125863
• Today
: 132
• Yesterday
: 1375