Question: What should I do if I get this error running distributed jobs across nodes on my Windows HPC Cluster ? Error Fatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(MPI_COMM_WORLD, group=0x88000001, new_comm=0x000000E071458E90) failed unable to connect to on port #####, no endpoint matches the netmask Jobs requiring a single node run without issues.
- on port #####, no endpoint matches the netmask
Jobs requiring a single node run without issues." target="_blank" rel="nofollow" title="LinkedIn"> - on port #####, no endpoint matches the netmask
Jobs requiring a single node run without issues." target="_blank" rel="nofollow" title="whatsapp"> - on port #####, no endpoint matches the netmask
Jobs requiring a single node run without issues." target="_blank" rel="nofollow" title="reddit"> - on port #####, no endpoint matches the netmask
Jobs requiring a single node run without issues." target="_blank" rel="nofollow" title="facebook">
Tagged: 19.2, HPC Pack, HPC/Parallel, Installation/Licensing/Systems, IP, Mechanical - SYS, mpi, N/A, netmask, Windows HPC
-
-
January 25, 2023 at 7:28 amFAQParticipant
Answer: Either the bind order of interfaces or an incorrectly set MPI NETMASK is causing the issue. A typical error may look like this unable to connect to 10.0.0.12 node12 on port 52935, no endpoint matches the netmask 10.0.1.0/255.255.255.0 Note the difference in subnets Please have you cluster / network administrator review the suggestions below a) Please check how many network interfaces do the compute nodes have. If multiple interfaces then please make sure that the bind order is set correctly. b) If there is only one interface and still seeing this error, then the MPI NETMASK may need to be configured correctly for this example it will need to be set to the 10.0.0.* subnet so the command will look like cluscfg setenvs CCP_MPI_NETMASK=10.0.0.0/255.255.255.0 Additional information If using RSM to submit job to the cluster then the RSM job log may show errors like the example below Running Solver : C:Program FilesANSYS Incv192ansysbinwinx64ANSYS192.exe -b nolist -s noread -p ansys -i remote.dat -o solve.out -dis -mpi msmpi -np 12 -dir “C:/scratch/n3r39eoc.i2n” job aborted: [ranks] message [0] fatal error Fatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(MPI_COMM_WORLD, group=0x88000001, new_comm=0x000000E071458E90) failed [ch3:sock] rank 0 unable to connect to rank 8 using business card
unable to connect to 10.0.0.12 node12 on port 52935, no endpoint matches the netmask 10.0.1.0/255.255.255.0 [1-11] terminated —- error analysis —– [0] on node01 mpi has detected a fatal error and aborted C:Program FilesANSYS Incv192ANSYSbinwinx64ANSYS.EXE —- error analysis —– . . . Command Exit Code: -4 ClusterJobs Exiting with code: -4 Individual Command Exit Codes are: [-4]
-
Introducing Ansys Electronics Desktop on Ansys Cloud
The Watch & Learn video article provides an overview of cloud computing from Electronics Desktop and details the product licenses and subscriptions to ANSYS Cloud Service that are...
How to Create a Reflector for a Center High-Mounted Stop Lamp (CHMSL)
This video article demonstrates how to create a reflector for a center high-mounted stop lamp. Optical Part design in Ansys SPEOS enables the design and validation of multiple...
Introducing the GEKO Turbulence Model in Ansys Fluent
The GEKO (GEneralized K-Omega) turbulence model offers a flexible, robust, general-purpose approach to RANS turbulence modeling. Introducing 2 videos: Part 1 provides background information on the model and a...
Postprocessing on Ansys EnSight
This video demonstrates exporting data from Fluent in EnSight Case Gold format, and it reviews the basic postprocessing capabilities of EnSight.
- When I am trying to launch Fluent, the GUI is stuck at this message. Host spawning Node 0 on machine “abcd-pc” (win64) There is no error. Same problem in serial mode I am not connected to VPN.
- An error occurred while starting the solver module. Please refer to the Troubleshooting section in the ANSYS Workbench Manual.
- How many cores are supported with a single or multiple ANSYS HPC pack?
- Question: Trying to install and seeing Mount Directory Error “Cannot locate file for media #::1…”
- Failover feature ‘Discovery – Level 1’ is not available
- Q: How do I set the license server from my client machine?
- How do I configure RSM to send a solve to a remote machine (no scheduler)?
- Tutorial:: Remote Solve Manager Tutorial: Configuring an Advanced ANSYS RSM Cluster (ARC) R18
- Unable to start the Geometry or Mechanical Editor (Linux)
- Are Ansys versions compatible with one another ?
© 2024 Copyright ANSYS, Inc. All rights reserved.