Overview:The opportunityThe Lead Systems Engineer - Virtualization platforms engages in the design, leads implementation, and provides Level 3 expert support for large-scale private Cloud computing and/or HPC infrastructure, with a specific emphasis on computing technologies including hardware layer, operating system, hypervisor, and orchestration services.Core42 is the UAEs national-scale enabler for cloud and generative AI, combining G42 Groups expertise across multiple technology disciplines into a single platform for public sector and large enterprise transformations. Building on our capabilities as sovereign cloud and HPC specialist, we bring generative AI, cybersecurity, professional and managed services expertise to enable national-scale program deployments across industries. Responsibilities:
Co-design, lead implementation, and manage hybrid virtualization and containerized platforms based on OpenStack, VMware VCF, and/or Red Hat OpenShift, ensuring platform stability, performance, and compliance with industry standards and best practices.
Define and oversee the implementation of the roadmap for all Virtualization and HPC platforms across the company.
Collaborate with architecture and engineering teams on technology stack component evaluation and selection, ensuring solutions are designed following best practices and optimized from both functional and non-functional perspectives.
Lead regular capacity planning exercises to anticipate and accommodate the growing demands on the virtualized environment and HPC infrastructure, ensuring it meets current and future requirements.
Develop and oversee plans to enhance the reliability of the computing infrastructure, addressing potential points of failure and ensuring high availability of services.
Lead regular performance assessments and implement improvements based on findings in collaboration with relevant teams.
Define and oversee execution of disaster recovery strategies ensuring system integrity, availability, and protection across all platforms and environments.
Design and enhance observability stack in collaboration with the infrastructure operations team ensuring monitoring coverage and accuracy.
Provide L3 expert support, including on-call shifts, and act as the final tier of resolution for L2 support teams through problem analysis and communication with vendors technical support.
Lead the collaboration with architecture and engineering teams on technology stack component evaluation and selection, ensuring solutions adhere to best practices and are optimized for both functional and non-functional requirements.
Lead the analysis and implementation of performance optimization strategies for the cloud computing and/or HPC environment to maximize efficiency and resource utilization.
Lead and mentor a team of engineers and collaborate with other infrastructure engineering and systems architect teams on solution design and delivery.
Collaborate with security management teams to ensure that systems are safe and secure against cybersecurity threats.
Write and maintain relevant documentation, ensuring completeness and quality.
Work closely with process management and operational teams, contributing to process development, standardizing the collaboration framework, and improving collaboration efficiency.
Participate in the Hiring process by conducting technical interviews and contributing to the teams growth and expertise. Qualifications:
To qualify for the role you must have
Bachelors or masters degree in computer science, Engineering, Software Engineering, or a related field in technology.
2+ years of experience leading a team of 3+ engineers, holding accountability for quality and timely delivery of infrastructure projects.
7+ years of experience and deep expertise in designing, implementing, and managing private cloud stacks with a focus on compute and virtualization technologies.
Extensive hands-on experience with at least one of the following platforms/stacks: OpenStack, Apache CloudStack, VMware VCF and Red Hat OpenShift, and related computing technologies such as x86 hardware, OS, KVM/ESXi, and orchestration services.
7+ years of hands-on experience in Linux Environments and 3+ years of experience in Senior Systems or Infrastructure engineering role.
Profound understanding of hardware architecture and components [x86 and ARM, NUMA, types of memory and channels, types of NICs, etc).
Good understanding of network and storage types and architecture.
Good understanding of Cloud Native concepts and technologies.
Experience in managing large-scale public or private cloud environments and/or working in a cloud service provider environment is highly desirable.
Advanced programming and scripting skills using Python and/or Golang, bash.
Good knowledge in Data center network designs and related technologies [OSI model, TCP/IP stack, routing, VLAN/VxLAN, etc]
Understanding of storage types, architecture, and protocols such as object/block/file storages, NFS/SMB, iSCSI, FC, etc.
Experience with integration of identity management, access management, and authorization solutions (PKI, LDAP, OAUTH, OpenID).
Hands-on experience with monitoring and observability tools like Zabbix or Nagios, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana).
Understanding of CI/CD principles, Infrastructure as Code (IaaC) approach and software defined infrastructure solutions.
Experience with database management and optimization for both SQL and NoSQL databases such as MySQL, PostgreSQL, MongoDB, or Cassandra is highly desirable.
Experience with ITSM tools such as Jira, Redmine, ServiceNow, etc.
Relevant certifications in Linux, virtualization, and cloud computing are a plus.
Knowledge and experience working with GPU-hardware and AI hardware accelerators is a plus.
Strong organizational skills with the ability to multitask and prioritize.
A proactive approach to problem-solving and decision-making.
What we look for If you are a performance-driven, inquisitive mind with the agility to adapt to ambiguity, you will fit right in. You should be eager to explore opportunities to build meaningful collaborations with stakeholders and aspire to create unique customer-centric solutions. Bias for action and a passion to conquer new frontiers in the AI space is at the heart of the Core42 community.What working at Core42 offers Culture: An open, diverse and inclusive environment with a global vision that encourages personal growth and focuses on ground-breaking, industry-first innovations. Career: Outstanding learning, development & growth opportunities via structured training programs and innovative, high-tech projects. Work-Life: A hybrid work policy to strike the perfect balance between office and home. Rewards: A competitive remuneration package with a host of perks including healthcare, education support, leave benefits and more.If you can confidently demonstrate that you meet the criteria above, please contact us as soon as possible.