Senior Cloud DevOps (Site Reliability Engineer)

  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Create sustainable systems and services through automation and uplifts.
  • Balance feature development speed and reliability with well-defined service level objectives.
  • Define, implement, and document operational processes and procedures, with periodic review for efficiency and feedback to the development team for improvement.
  • Measurement, optimization, and tuning of system performance and ensuring that systems will run reliably and are highly available in a 24/7 production environment.