Ansys Products

Ansys Products

Discuss installation & licensing of our Ansys Teaching and Research products.

(MPI?) problem when running System Coupling on Linux HPC cluster

    • jfonken
      Subscriber

      Hi all,

      I'm running into troubles with my FSI simulations using Ansys System Coupling on a Linux Centos8 HPC cluster with openMPI. The SLURM environment is used to schedule the jobs. Ansys 2021R2 is used. My simulation runs fine on my own laptop, but gives an error message when running on the cluster. I will attach the error message in a comment, since it's too long to fit in the question section. Both Fluent as Mechanical do not give an error message before the system coupling process is interrupted.

      The debug information from system coupling shows that it is able to get the Fluent mesh and also the nodes and elements of the mechanical mesh, but the trace stops after obtaining the elements (see second command).

    • jfonken
      Subscriber
      Error message from System Coupling:
      |Build Information|
      +-----------------------------------------------------------------------------+
      | System Coupling|
      |2021 R2: Build ID: 5b5c87f Build Date: 21 May 2021 08:38:06|
      | Fluid Flow (Fluent)|
      |ANSYS Fluent 21.2.0, Build Time:May 28 2021 13:48:13 EDT, Build Id:10201, |
      |OS Version:lnamd64|
      | MAPDL Transient|
      |Mechanical APDL Release 2021 R2Build 21.2UP20210601|
      |DISTRIBUTED LINUX x64Version|
      +=============================================================================+

      ===============================================================================
      +=============================================================================+
      ||
      |Analysis Initialization|
      ||
      +=============================================================================+
      ===============================================================================

      sched_setaffinity() call failed: Invalid argument
      sched_setaffinity() call failed: Invalid argument
      make: *** No rule to make target 'clean'.Stop.
      [tcn362:762304] *** An error occurred in MPI_Gatherv
      [tcn362:762304] *** reported by process [2322595841,1]
      [tcn362:762304] *** on communicator MPI_COMM_WORLD
      [tcn362:762304] *** MPI_ERR_ARG: invalid argument of some other kind
      [tcn362:762304] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort [tcn362:762304] ***and potentially your MPI job)
      Error in TcpCommunicatingSocket::recv End of file
      Error in TcpCommunicatingSocket::send Broken pipe

      ==============================================================================
      Stack backtrace generated for process id 762959 on signal 11 :
      Error in TcpCommunicatingSocket::send Broken pipe
      1000000: fluent() [0x785ff9]
      1000000: /lib64/libc.so.6(+0x37400) [0x14a630403400]
      1000000: fluent(RpcNextArgAsInt32+0) [0xb274d0]
      1000000: fluent(Get_Restart_Initial_Step_Index+0x66) [0x872ee6]
      1000000: fluent() [0x8245a6]
      1000000: fluent(eval+0x4b5) [0x8c1955]
      1000000: fluent(eval+0x6cd) [0x8c1b6d]
      1000000: fluent(eval+0xd21) [0x8c21c1]
      1000000: fluent(eval+0xd21) [0x8c21c1]
      1000000: fluent(eval+0xd21) [0x8c21c1]
      1000000: fluent(eval+0xd21) [0x8c21c1]
      1000000: fluent() [0x8c29b6]
      1000000: fluent(eval_errprotect+0x4e) [0x8c308e]
      1000000: fluent(eval+0x21f) [0x8c16bf]
      1000000: fluent(eval+0xd21) [0x8c21c1]
      Please include this information with any bug report you file on this issue!
      ==============================================================================

      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe

      +-----------------------------------------------------------------------------+
      | Failed to retrieve mesh(es).|
      +-----------------------------------------------------------------------------+
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe

      Error: Cortex received a fatal signal (SEGMENTATION VIOLATION).
      Error Object: Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Error in TcpCommunicatingSocket::send Broken pipe
      Traceback (most recent call last):
      File "PyLib/physicscoupling/importer/__init__.py", line 40, in importMesh
      File "PyLib/kernel/util/Memory.py", line 177, in wrapper
      File "PyLib/physicscoupling/importer/__init__.py", line 121, in buildMesh
      File "PyLib/physicscoupling/importer/__init__.py", line 158, in _getFaceAndCellZoneIds
      AttributeError: 'tuple' object has no attribute 'items'

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
      File "PyLib/main/Controller.py", line 147, in
      File "PyLib/main/Controller.py", line 143, in _run
      File "PyLib/main/Controller.py", line 92, in _executeScript
      File "PyLib/kernel/commands/__init__.py", line 31, in readScriptFile
      File "PyLib/kernel/commands/CommandManager.py", line 169, in readScriptFile
      File "runFSI.txt", line 49, in
      Solve File "PyLib/kernel/commands/CommandDefinition.py", line 74, in func
      File "PyLib/kernel/commands/__init__.py", line 28, in executeCommand
      File "PyLib/kernel/commands/CommandManager.py", line 122, in executeCommand
      File "PyLib/cosimulation/externalinterface/core/solver.py", line 125, in execute
      File "PyLib/cosimulation/solver/__init__.py", line 123, in solve
      File "PyLib/kernel/util/Memory.py", line 177, in wrapper
      File "PyLib/cosimulation/solver/__init__.py", line 526, in __initializeControlled
      File "PyLib/cosimulation/solver/__init__.py", line 796, in __importMesh
      File "PyLib/kernel/util/Memory.py", line 177, in wrapper
      File "PyLib/cosimulation/solver/__init__.py", line 807, in __importMeshAndCreateZonesForFmu
      File "PyLib/physicscoupling/importer/__init__.py", line 43, in importMesh
      RuntimeError: Failed to retrieve mesh(es).
      Shutting down System Coupling compute node processes.
      Error in TcpCommunicatingSocket::send Broken pipe
      Some compute-node processes or machines have crashed.
      Host process lost connection while reading. Fatal error!

      999999 (../../src/mpsystem.c@1221): mpt_read: failed: errno = 0

      999999: mpt_read: error: read failed trying to read 4 bytes: Success
    • jfonken
      Subscriber
      Debug log (last part):
      SndReq (FLUENT-1) MeshData::GetNodes
      Req: 5356, 0, 110007
      CTrace (3): Leaving makeRemoteCall
      Rsp: [3153, 3156, 3158, 3160, 3163, ... (n=110007; min=3153; max=114098) ], [-0.00616212, 0.00786893, -8.32667e-17,
      -0.0029959, 0.00953511, ... (n=330021; min=-0.0253699; max=0.168801; mean=0.0327284) ]
      CTrace (3): Entering makeRemoteCall
      SndReq (FLUENT-1) FaceMeshData::GetElementCount
      Req: 5356, -1
      CTrace (3): Leaving makeRemoteCall
      Rsp: 219697
      CTrace (3): Entering makeRemoteCall
      SndReq (FLUENT-1) FaceMeshData::GetElements
      Req: 5356, -1, 0, 219697
      CTrace (3): Leaving makeRemoteCall
      Rsp: [5767673, 5767674, 5767675, 5767676, 5767677, ... (n=219697; min=5767673; max=5987369) ], [1825839, 1769492,
      1826561, 1828079, 1821171, ... (n=219697; min=48; max=2061779) ], [0, 0, 0, 0, 0, ... (n=219697; min=0; max=0) ],
      [4411, 4410, 4409, 4414, 4413, ... (n=659091; min=3153; max=114098) ], [3, 3, 3, 3, 3, ... (n=219697; min=3; max=3) ]
      CTrace (3): Entering makeRemoteCall
      SndReq (FLUENT-1) RegionFilter::DeleteFilter
      Req: 1
      CTrace (3): Leaving makeRemoteCall
      CTrace (2): Leaving fillRegionData
      CTrace (1): Leaving loadRegions
      CTrace (1): Entering loadRegions
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionFilter::NewFilter
      CTrace (2): Leaving makeRemoteCall
      Rsp: 1
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionFilter::SetRegionName
      Req: 1, FSIN_1
      CTrace (2): Leaving makeRemoteCall
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionInfo::ApplyFilter
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionInfo::GetIds
      CTrace (2): Leaving makeRemoteCall
      Rsp: [1]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionInfo::GetTopolDimension
      CTrace (2): Leaving makeRemoteCall
      Rsp: [2]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshInfo::GetUnits
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      Rsp: [0]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetNodeCount
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      Rsp: 68580
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetNodes
      Req: 1, 0, 68580
      CTrace (2): Leaving makeRemoteCall
      CTrace (2): Entering convertUnits
      CTrace (2): Leaving convertUnits
      Rsp: [62366, 62368, 62681, 62679, 62367, ... (n=68580; min=1892; max=250040) ], [-0.01, -2.6159e-18, -8.32667e-17,
      -0.00999893, 3.49372e-05, ... (n=205740; min=-0.025373; max=0.168801; mean=0.0308368) ]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetElementCount
      Req: 1, -1
      CTrace (2): Leaving makeRemoteCall
      Rsp: 22788
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetElements
      Req: 1, -1, 0, 22788
      CTrace (2): Leaving makeRemoteCall
      Rsp: [8, 8, 8, 8, 8, ... (n=22788; min=8; max=8) ], [8, 8, 8, 8, 8, ... (n=22788; min=8; max=8) ], [62366,
      62368, 62681, 62679, 62367, ... (n=182304; min=1892; max=250040) ]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionFilter::DeleteFilter
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      CTrace (1): Leaving loadRegions
      CTrace (1): Entering getAndDumpNodes
      CTrace (2): Entering receiveRegionNodes
      CTrace (2): Leaving receiveRegionNodes
      CTrace (2): Entering receiveRegionNodes
      CTrace (3): Entering getNodes
      CTrace (3): Leaving getNodes
      CTrace (3): Entering getNodes
      CTrace (3): Leaving getNodes
      CTrace (2): Leaving receiveRegionNodes
      CTrace (2): Entering setup
      CTrace (2): Leaving setup
      CTrace (2): Entering collect
      CTrace (2): Leaving collect
      CTrace (2): Entering fillNodes
      CTrace (2): Leaving fillNodes
      CTrace (1): Leaving getAndDumpNodes
      CTrace (1): Entering getAndDumpCells
    • jfonken
      Subscriber
      Debugging log (last part):
      SndReq (FLUENT-1) MeshData::GetNodes
      Req: 5356, 0, 110007
      CTrace (3): Leaving makeRemoteCall
      Rsp: [3153, 3156, 3158, 3160, 3163, ... (n=110007; min=3153; max=114098) ], [-0.00616212, 0.00786893, -8.32667e-17,
      -0.0029959, 0.00953511, ... (n=330021; min=-0.0253699; max=0.168801; mean=0.0327284) ]
      CTrace (3): Entering makeRemoteCall
      SndReq (FLUENT-1) FaceMeshData::GetElementCount
      Req: 5356, -1
      CTrace (3): Leaving makeRemoteCall
      Rsp: 219697
      CTrace (3): Entering makeRemoteCall
      SndReq (FLUENT-1) FaceMeshData::GetElements
      Req: 5356, -1, 0, 219697
      CTrace (3): Leaving makeRemoteCall
      Rsp: [5767673, 5767674, 5767675, 5767676, 5767677, ... (n=219697; min=5767673; max=5987369) ], [1825839, 1769492,
      1826561, 1828079, 1821171, ... (n=219697; min=48; max=2061779) ], [0, 0, 0, 0, 0, ... (n=219697; min=0; max=0) ],
      [4411, 4410, 4409, 4414, 4413, ... (n=659091; min=3153; max=114098) ], [3, 3, 3, 3, 3, ... (n=219697; min=3; max=3) ]
      CTrace (3): Entering makeRemoteCall
      SndReq (FLUENT-1) RegionFilter::DeleteFilter
      Req: 1
      CTrace (3): Leaving makeRemoteCall
      CTrace (2): Leaving fillRegionData
      CTrace (1): Leaving loadRegions
      CTrace (1): Entering loadRegions
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionFilter::NewFilter
      CTrace (2): Leaving makeRemoteCall
      Rsp: 1
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionFilter::SetRegionName
      Req: 1, FSIN_1
      CTrace (2): Leaving makeRemoteCall
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionInfo::ApplyFilter
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionInfo::GetIds
      CTrace (2): Leaving makeRemoteCall
      Rsp: [1]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionInfo::GetTopolDimension
      CTrace (2): Leaving makeRemoteCall
      Rsp: [2]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshInfo::GetUnits
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      Rsp: [0]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetNodeCount
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      Rsp: 68580
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetNodes
      Req: 1, 0, 68580
      CTrace (2): Leaving makeRemoteCall
      CTrace (2): Entering convertUnits
      CTrace (2): Leaving convertUnits
      Rsp: [62366, 62368, 62681, 62679, 62367, ... (n=68580; min=1892; max=250040) ], [-0.01, -2.6159e-18, -8.32667e-17,
      -0.00999893, 3.49372e-05, ... (n=205740; min=-0.025373; max=0.168801; mean=0.0308368) ]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetElementCount
      Req: 1, -1
      CTrace (2): Leaving makeRemoteCall
      Rsp: 22788
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetElements
      Req: 1, -1, 0, 22788
      CTrace (2): Leaving makeRemoteCall
      Rsp: [8, 8, 8, 8, 8, ... (n=22788; min=8; max=8) ], [8, 8, 8, 8, 8, ... (n=22788; min=8; max=8) ], [62366,
      62368, 62681, 62679, 62367, ... (n=182304; min=1892; max=250040) ]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionFilter::DeleteFilter
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      CTrace (1): Leaving loadRegions
      CTrace (1): Entering getAndDumpNodes
      CTrace (2): Entering receiveRegionNodes
      CTrace (2): Leaving receiveRegionNodes
      CTrace (2): Entering receiveRegionNodes
      CTrace (3): Entering getNodes
      CTrace (3): Leaving getNodes
      CTrace (3): Entering getNodes
      CTrace (3): Leaving getNodes
      CTrace (2): Leaving receiveRegionNodes
      CTrace (2): Entering setup
      CTrace (2): Leaving setup
      CTrace (2): Entering collect
      CTrace (2): Leaving collect
      CTrace (2): Entering fillNodes
      CTrace (2): Leaving fillNodes
      CTrace (1): Leaving getAndDumpNodes
      CTrace (1): Entering getAndDumpCells
    • jfonken
      Subscriber
      Debugging log (last part)
      SndReq (FLUENT-1) MeshData::GetNodes
      Req: 5356, 0, 110007
      CTrace (3): Leaving makeRemoteCall
      Rsp: [3153, 3156, 3158, 3160, 3163, ... (n=110007; min=3153; max=114098) ], [-0.00616212, 0.00786893, -8.32667e-17,
      -0.0029959, 0.00953511, ... (n=330021; min=-0.0253699; max=0.168801; mean=0.0327284) ]
      CTrace (3): Entering makeRemoteCall
      SndReq (FLUENT-1) FaceMeshData::GetElementCount
      Req: 5356, -1
      CTrace (3): Leaving makeRemoteCall
      Rsp: 219697
      CTrace (3): Entering makeRemoteCall
      SndReq (FLUENT-1) FaceMeshData::GetElements
      Req: 5356, -1, 0, 219697
      CTrace (3): Leaving makeRemoteCall
      Rsp: [5767673, 5767674, 5767675, 5767676, 5767677, ... (n=219697; min=5767673; max=5987369) ], [1825839, 1769492,
      1826561, 1828079, 1821171, ... (n=219697; min=48; max=2061779) ], [0, 0, 0, 0, 0, ... (n=219697; min=0; max=0) ],
      [4411, 4410, 4409, 4414, 4413, ... (n=659091; min=3153; max=114098) ], [3, 3, 3, 3, 3, ... (n=219697; min=3; max=3) ]
      CTrace (3): Entering makeRemoteCall
      SndReq (FLUENT-1) RegionFilter::DeleteFilter
      Req: 1
      CTrace (3): Leaving makeRemoteCall
      CTrace (2): Leaving fillRegionData
      CTrace (1): Leaving loadRegions
      CTrace (1): Entering loadRegions
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionFilter::NewFilter
      CTrace (2): Leaving makeRemoteCall
      Rsp: 1
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionFilter::SetRegionName
      Req: 1, FSIN_1
      CTrace (2): Leaving makeRemoteCall
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionInfo::ApplyFilter
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionInfo::GetIds
      CTrace (2): Leaving makeRemoteCall
      Rsp: [1]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionInfo::GetTopolDimension
      CTrace (2): Leaving makeRemoteCall
      Rsp: [2]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshInfo::GetUnits
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      Rsp: [0]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetNodeCount
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      Rsp: 68580
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetNodes
      Req: 1, 0, 68580
      CTrace (2): Leaving makeRemoteCall
      CTrace (2): Entering convertUnits
      CTrace (2): Leaving convertUnits
      Rsp: [62366, 62368, 62681, 62679, 62367, ... (n=68580; min=1892; max=250040) ], [-0.01, -2.6159e-18, -8.32667e-17,
      -0.00999893, 3.49372e-05, ... (n=205740; min=-0.025373; max=0.168801; mean=0.0308368) ]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetElementCount
      Req: 1, -1
      CTrace (2): Leaving makeRemoteCall
      Rsp: 22788
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) MeshData::GetElements
      Req: 1, -1, 0, 22788
      CTrace (2): Leaving makeRemoteCall
      Rsp: [8, 8, 8, 8, 8, ... (n=22788; min=8; max=8) ], [8, 8, 8, 8, 8, ... (n=22788; min=8; max=8) ], [62366,
      62368, 62681, 62679, 62367, ... (n=182304; min=1892; max=250040) ]
      CTrace (2): Entering makeRemoteCall
      SndReq (MAPDL-2) RegionFilter::DeleteFilter
      Req: 1
      CTrace (2): Leaving makeRemoteCall
      CTrace (1): Leaving loadRegions
      CTrace (1): Entering getAndDumpNodes
      CTrace (2): Entering receiveRegionNodes
      CTrace (2): Leaving receiveRegionNodes
      CTrace (2): Entering receiveRegionNodes
      CTrace (3): Entering getNodes
      CTrace (3): Leaving getNodes
      CTrace (3): Entering getNodes
      CTrace (3): Leaving getNodes
      CTrace (2): Leaving receiveRegionNodes
      CTrace (2): Entering setup
      CTrace (2): Leaving setup
      CTrace (2): Entering collect
      CTrace (2): Leaving collect
      CTrace (2): Entering fillNodes
      CTrace (2): Leaving fillNodes
      CTrace (1): Leaving getAndDumpNodes
      CTrace (1): Entering getAndDumpCells
    • Ulrich
      Ansys Employee
      Hi Judith let's try to narrow down the problem.
      Can you run Fluent alone on the linux cluster, standalone and/or via Workbench?
      And the same question for Mechanical
      Can you run Mechanical alone on the linux cluster, standalone and/or via Workbench?
      Regards
      Ulrich S.
    • jfonken
      Subscriber
      Hi Ulrich Thanks for your reply and sorry that I didn't include this information. Both Ansys Fluent as Ansys Mechanical (both 2021R2 versions) run fine no the cluster, using the openmpi option.
      Best Judith
    • Ulrich
      Ansys Employee
      Hi Judith Thanks for confirming that Fluent and Mechanical work standalone.
      Hence, you seem to face a special problem with system coupling on a linux cluster with SLURM.
      How are you running System Coupling, via Workbench or in standalone (e. g. via CLI)?
      In this context, I have only found the chapter ÔÇ£Using Parallel Processing CapabilitiesÔÇØ in the ÔÇ£System Coupling User's GuideÔÇØ (https://ansyshelp.ansys.com/account/secured?returnurl=/Views/Secured/corp/v212/en/sysc_ug/sysc_userinterfaces_advtasks_parallel.html?q=linux) so far.
      I will try to find more.
      Best Regards Ulrich S.
    • jfonken
      Subscriber
      Hi Ulrich,
      I run System Coupling as a standalone program. I call it using:
      "/sw/arch/Centos8/EB_production/2021/software/ANSYS/2021R2/v212/SystemCoupling/bin/systemcoupling" --mpi openmpi --cnf=${NODEFILE} -R runFSI.txt -l3
      The contents of the runFSI.txt file is:
      # Load participants
      AddParticipant(InputFile = 'fluent.scp')
      AddParticipant(InputFile = 'structural.scp')
      # Create coupling interface
      AddInterface(SideOneParticipant = 'FLUENT-1',SideOneRegions = ['wall_lumen'],SideTwoParticipant = 'MAPDL-2',SideTwoRegions = ['FSIN_1'])
      # Add data transfers
      # Data transfer 1
      AddDataTransfer(Interface = 'Interface-1',TargetSide = 'Two',SideOneVariable = 'force', SideTwoVariable = 'FORC')
      # Data transfer 2
      AddDataTransfer(Interface = 'Interface-1',TargetSide = 'One',SideOneVariable = 'displacement',SideTwoVariable = 'INCD')
      # Set participant execution controls
      execCon = DatamodelRoot().CouplingParticipant
      execCon['FLUENT-1'].ExecutionControl.ParallelFraction=1.0/6.0
      execCon['FLUENT-1'].ExecutionControl.AdditionalArguments = '-mpi=openmpi -meshing -gu'
      execCon['MAPDL-2'].ExecutionControl.ParallelFraction=5.0/6.0
      execCon['MAPDL-2'].ExecutionControl.AdditionalArguments = '-mpi openmpi'
      # Analysis settings
      DatamodelRoot().SolutionControl.MinimumIterations = 1
      DatamodelRoot().SolutionControl.MaximumIterations = 20
      DatamodelRoot().SolutionControl.TimeStepSize = 0.005
      DatamodelRoot().SolutionControl.EndTime = 2.4
      # Add stabilization
      DataTrans1 = DatamodelRoot().CouplingInterface['Interface-1'].DataTransfer['displacement']
      DataTrans1.ConvergenceTarget = 0.01
      DataTrans1.Stabilization.Option = 'Quasi-Newton'
      DataTrans1.Stabilization.MaximumRetainedTimeSteps = 1
      DataTrans1.Stabilization.InitialRelaxationFactor = 0.1
      DataTrans1.PrintState DataTrans2 = DatamodelRoot().CouplingInterface['Interface-1'].DataTransfer['FORC']
      DataTrans2.ConvergenceTarget = 0.01
      DataTrans2.Stabilization.Option = 'None'
      DataTrans2.PrintState # Create restart points at every 5 time steps
      DatamodelRoot().OutputControl.Option = 'StepInterval'
      DatamodelRoot().OutputControl.OutputFrequency = '5'
      Solve
      You can also view all my input and output files in the .7p file that I've attached to my initial post.
      Best Judith
    • Paul Hutcheson
      Ansys Employee
      Hi Judith Please check the line:
      execCon['FLUENT-1'].ExecutionControl.AdditionalArguments = '-mpi=openmpi -meshing -gu'
      "-meshing" launches Fluent Meshing, which must be a mistake and could be the reason system coupling fails during mapping because it expects Fluent solver. Can you remove "-meshing" and try again?
      Paul
    • jfonken
      Subscriber
      Hi Paul I noticed this mistake in my input file a while ago as well and corrected it. I forgot to update it on the forum. Unfortunately, removing "-meshing" didn't resolve my problem. Would you have any other suggestions?
      Best Judith
    • Paul Hutcheson
      Ansys Employee
      Hi Judith I'm not sure the cause of the error yet then. Debug files would need to be interpreted by a developer.
      Did you try the default MPI, by removing all MPI options?
      Note also System Coupling can read SLURM environment variables if first set by a bash script, where at the end of the script SyC is launched with:
      "/sw/arch/Centos8/EB_production/2021/software/ANSYS/2021R2/v212/SystemCoupling/bin/systemcoupling" -R runFSI.txt -s3
      Note the core count for System Coupling is set with "-sN" not "-lN" where N is core count.
      Paul
    • jfonken
      Subscriber
      Hi Paul I tried the default MPI as well, but without any success unfortunately.
      I launch SyC with:
      "/sw/arch/Centos8/EB_production/2021/software/ANSYS/2021R2/v212/SystemCoupling/bin/systemcoupling" --mpi openmpi --cnf=${NODEFILE} -R runFSI.txt -l3
      In which I use the --cnf flag to assign a certain number of cores. I also tried it with the -s and -t flags, but the simulation only works when only 1 core is used for Fluent and 1 for Mechanical APDL. The -l flag was set to get debug output.
      Best Judith
    • Paul Hutcheson
      Ansys Employee
      Hi Judith Ok, that's interesting that you say it worked when 1 core was assigned to each solver.
      runFSI should be a python script, better with extension .py. I haven't tested whether extension .txt will be interpreted as a python script. Likely though if it works with 1 core per solver this is not a problem.
      Since it seems to be happy with 1 core each but not more, I think we should look at the job submission method and core assignment. Have you got a SLURM submission script that sets environmental variables? Here is an example you can use for SyC jobs on SLURM:
      #!/bin/bash -l
      #
      # Set slurm options as needed
      #
      #SBATCH --job-name SYSC
      #SBATCH --nodes=2
      #SBATCH --partition=ottc02
      #SBATCH --ntasks-per-node=32
      #SBATCH --output=%x-%j.out
      #SBATCH --error=%x-%j.err
      #SBATCH --export=ALL
      export AWP_ROOT212=/sw/arch/Centos8/EB_production/2021/software/ANSYS/2021R2/v212
      #
      export SYSC_ROOT=${AWP_ROOT212}/SystemCoupling
      #
      # print job start time and Slurm job resources
      #
      date
      echo "SLURM_JOB_ID : "$SLURM_JOB_ID
      echo "SLURM_JOB_NODELIST : "$SLURM_JOB_NODELIST
      echo "SLURM_JOB_NUM_NODES : "$SLURM_JOB_NUM_NODES
      echo "SLURM_NODELIST : "$SLURM_NODELIST
      echo "SLURM_NTASKS : "$SLURM_NTASKS
      echo "SLURM_TASKS_PER_NODE : "$SLURM_TASKS_PER_NODE
      echo "working directory : "$SLURM_SUBMIT_DIR
      #
      echo "Running System Coupling"
      echo "System coupling main execution host is $HOSTNAME"
      echo "Current working directory is $PWD"
      #echo "ANSYS install root is $AWP_ROOT212"
      echo "System coupling root is $SYSC_ROOT"
      echo "Run script is $1"
      echo
      "$SYSC_ROOT/bin/systemcoupling" -R runFSI.txt
      you'd have to change job name, nodes, partition and ntasks-per-node.
      Paul

Viewing 13 reply threads
  • The topic ‘(MPI?) problem when running System Coupling on Linux HPC cluster’ is closed to new replies.