Licensing

Licensing

When trying to launch a parallel job on a Microsoft cluster you see this error: Fatal error in MPI_Comm_dup: Other MPI error, error stack: MPI_Comm_dup(MPI_COMM_WORLD, new_comm=0x00000028ECF5E8F0) failed [ch3:nd] Could not connect via NetworkDirect to rank ## with business card (port=##### description=”enterprise_ip public_ip hostname ” shm_hostname shm_queue=5360:344 ). There is no NetworkDirect information in the business card and fallback to the socket interconnect is disabled. Check the remote NetworkDirect configuration or set the MPICH_ND_ENABLE_FALLBACK environment variable to true.

    • FAQFAQ
      Participant

      Resolution: Ensure that the correct version of the Infiniband Mellanox driver is installed on all nodes and that the versions match all nodes.