Electronics

Electronics

Topics related to HFSS, Maxwell, SIwave, Icepak, Electronics Enterprise and more.

HFSS libnvidia-ml.so too old or could not be found – Warning in slurm job output

    • jimmy.nguyen
      Subscriber

      Hi all,

      I am submitting a script using sbatch to a SLURM cluster. I am using HFSS 25R1 without a graphical environment.

      I see the output:

      [warning] For machine [...] libnvidia-ml.so is too old or could not be found in : 1) /usr/lib64 or 2) /usr/lib/x86_64-linux-gnu or the system default paths. The libnvidia-ml.so binary is required for GPU acceleration and therefore only CPU will be used. For further information contact Ansys support.

      I am using CUDA 12.3 and have a driver version of 545.23.08. The CUDA libraries are not installed in either /usr/lib64 or /usr/lib/x86_64-linux-gnu. They are in a different directory. I added this directory to LD_LIBRARY_PATH, yet ansysedt does not appear to be able to find it.

    • Ivonne Marti
      Ansys Employee

      Hi jimmy.nguyen,

      To resolve this warning, you can try the following options

      §  Check for the Library: Ensure that the libnvidia-ml.so library exists in its default directory, which is typically /usr/lib64. If it is not present, you may need to install or update your NVIDIA drivers.

      §  Update NVIDIA Drivers: Visit the NVIDIA website to download and install the latest drivers for your GPU. It's important to install drivers directly from the NVIDIA website rather than using the Windows device manager update. After installation, make sure to restart your computer to apply the changes. 

      §  Create Necessary Directories: If the directory /usr/lib64 is missing, you may need to create it and ensure that the libnvidia-ml.so file is placed there. You can also add its path to the Linux environment variable PATH through a shell startup file. 

      More detailed information can be found in the HFSS Help page 1988

      §  Enable/Disable GPU Acceleration: You can manage GPU acceleration settings from the Ansys Electronics Desktop Solver User Interface. This might help in resolving any configuration issues related to GPU usage.

      I hope it helps

    • jimmy.nguyen
      Subscriber

      I have added my CUDA install directory to LD_LIBRARY_PATH and PATH, yet there appears to be no change in behavior.

      1) Which NVIDIA driver version is required by 2025R1?

      2) Are there any debug options/flags that I could use to run ansysedt and see if it immediately detects a GPU? Currently I am submitting jobs to the SLURM cluster and reading back the log files, which takes a while.

      shown below is my slurm script if it helps:

      #!/bin/bash

      #SBATCH --export ALL
      #SBATCH --chdir '/projects/users/jimmy'
      #SBATCH --job-name project1
      #SBATCH --partition system.q
      #SBATCH --nodes 1
      #SBATCH --cpus-per-task 128
      #SBATCH --ntasks 1
      #SBATCH --gres gpu:1
      #SBATCH --output /projects/users/jimmy/slurm-%x-%j.out

      module load slurm
      module load cuda12.2/toolkit

      export ANSYSEM_GENERIC_MPI_WRAPPER=/software/hfss/2025R1/v251/AnsysEM/schedulers/scripts/utils/slurm_srun_wrapper.sh
      export ANSYSEM_COMMON_PREFIX=/software/hfss/2025R1/v251/AnsysEM/common
      export ANSOFT_PASS_DEBUG_ENV_TO_REMOTE_ENGINES=1
      export LD_LIBRARY_PATH=/cuda/libs/current/lib64:$LD_LIBRARY_PATH
      export PATH=/cuda/libs/current/lib64:$PATH

      srun /software/hfss/2025R1/v251/AnsysEM/schedulers/scripts/utils/ansysedt_launcher.sh

      srun --overcommit --cpu-bind=none --mem-per-cpu=0 --gpus-per-node=1 --overlap /software/hfss/2025R1/v251/AnsysEM/ansysedt -distributed includetypes=default maxlevels=1 -machinelist num=1 numgpus=1 -monitor -ng -batchoptions ' '\''HFSS/EnableGPU'\''='\''1'\'' '\''HFSS/NumCoresPerDistributedTask'\''='\''128'\'' '\''HFSS/RAMLimitPercent'\''='\''90'\''' -batchsolve HFSSDesign1 /projects/users/jimmy/Project.aedt

       

Viewing 2 reply threads
  • You must be logged in to reply to this topic.