Senior HPC DevOps Engineer (Hybrid/Remote)(15583)
Key Duties and Responsibilities
- Deploy, configure, and maintain High-Performance Computing (HPC) systems to ensure optimal performance and reliability.
- Customize, configure, and troubleshoot job schedulers such as SLURM, PBA, LSF, and others to meet specific operational requirements.
- Collaborates with others to brainstorm best techniques to resolve complex technological infrastructure, build or packaging problems
- Performs moderately complex debug and testing actions on code, processes, and deployments to identify ways to streamline execution and minimize errors encountered
- Responsibilities may involve build and packaging open-source third-party software packages
- Maintenance of tools, infrastructure and build environment for product creation staff
- Employs best practices and helps to maintain them through technical reviews