Lead Systems Engineer

Abu Dhabi, United Arab Emirates

Job Description

OverviewThe Lead Systems Engineer - Computing Technology engages in the design, leads implementation, and provides Level 3 expert support for large-scale private Cloud computing and/or HPC infrastructure, with a specific emphasis on computing technologies including hardware layer, operating system, hypervisor, and orchestration services.Responsibilities
  • Co-design, lead implementation, and manage hybrid virtualization and containerized platforms based on OpenStack, VMware VCF, and/or Red Hat OpenShift, ensuring platform stability, performance, and compliance with industry standards and best practices.
  • Define and oversee the implementation of the roadmap for all Virtualization and HPC platforms across the company.
  • Collaborate with architecture and engineering teams on technology stack component evaluation and selection, ensuring solutions are designed following best practices and optimized from both functional and non-functional perspectives.
  • Lead regular capacity planning exercises to anticipate and accommodate the growing demands on the virtualized environment and HPC infrastructure, ensuring it meets current and future requirements.
  • Develop and oversee plans to enhance the reliability of the computing infrastructure, addressing potential points of failure and ensuring high availability of services.
  • Lead regular performance assessments and implement improvements based on findings in collaboration with relevant teams.
  • Define and oversee execution of disaster recovery strategies ensuring system integrity, availability, and protection across all platforms and environments.
  • Design and enhance observability stack in collaboration with the infrastructure operations team ensuring monitoring coverage and accuracy.
  • Provide L3 expert support, including on-call shifts, and act as the final tier of resolution for L2 support teams through problem analysis and communication with vendors technical support.
  • Lead the collaboration with architecture and engineering teams on technology stack component evaluation and selection, ensuring solutions adhere to best practices and are optimized for both functional and non-functional requirements.
  • Lead the analysis and implementation of performance optimization strategies for the cloud computing and/or HPC environment to maximize efficiency and resource utilization.
  • Lead and mentor a team of engineers and collaborate with other infrastructure engineering and systems architect teams on solution design and delivery.
  • Collaborate with security management teams to ensure that systems are safe and secure against cybersecurity threats.
  • Write and maintain relevant documentation, ensuring completeness and quality.
  • Work closely with process management and operational teams, contributing to process development, standardizing the collaboration framework, and improving collaboration efficiency.
  • Participate in the Hiring process by conducting technical interviews and contributing to the teams growth and expertise.
Qualifications
  • Bachelors or masters degree in computer science, Engineering, Software Engineering, or a related field in technology.
  • 2+ years of experience leading a team of 3+ engineers, holding accountability for quality and timely delivery of infrastructure projects.
  • 7+ years of experience and deep expertise in designing, implementing, and managing private cloud stacks with a focus on compute and virtualization technologies.
  • Extensive hands-on experience with at least one of the following platforms/stacks: OpenStack, Apache CloudStack, VMware VCF and Red Hat OpenShift, and related computing technologies such as x86 hardware, OS, KVM/ESXi, and orchestration services.
  • 7+ years of hands-on experience in Linux Environments and 3+ years of experience in Senior Systems or Infrastructure engineering role.
  • Profound understanding of hardware architecture and components [x86 and ARM, NUMA, types of memory and channels, types of NICs, etc).
  • Good understanding of network and storage types and architecture.
  • Good understanding of Cloud Native concepts and technologies.
  • Experience in managing large-scale public or private cloud environments and/or working in a cloud service provider environment is highly desirable.
  • Advanced programming and scripting skills using Python and/or Golang, bash.
  • Good knowledge in Data center network designs and related technologies [OSI model, TCP/IP stack, routing, VLAN/VxLAN, etc]
  • Understanding of storage types, architecture, and protocols such as object/block/file storages, NFS/SMB, iSCSI, FC, etc.
  • Experience with integration of identity management, access management, and authorization solutions (PKI, LDAP, OAUTH, OpenID).
  • Hands-on experience with monitoring and observability tools like Zabbix or Nagios, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana).
  • Understanding of CI/CD principles, Infrastructure as Code (IaaC) approach and software defined infrastructure solutions.
  • Experience with database management and optimization for both SQL and NoSQL databases such as MySQL, PostgreSQL, MongoDB, or Cassandra is highly desirable.
  • Experience with ITSM tools such as Jira, Redmine, ServiceNow, etc.
  • Relevant certifications in Linux, virtualization, and cloud computing are a plus.
  • Knowledge and experience working with GPU-hardware and AI hardware accelerators is a plus.
  • Strong organizational skills with the ability to multitask and prioritize.
  • A proactive approach to problem-solving and decision-making.

Core42

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD1763891
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Abu Dhabi, United Arab Emirates
  • Education
    Not mentioned