Overview:
The opportunity
A Platform Engineer is responsible for designing, building, and maintaining the infrastructure that supports high-performance computing tasks and AI workloads. They ensure the scalability, reliability, and efficiency of computing platforms, integrating hardware and software to optimize performance. Additionally, they collaborate with data scientists and developers to troubleshoot and enhance platform capabilities, enabling advanced computational tasks and innovations.
Core42 is the UAE's national-scale enabler for cloud and generative AI, combining G42 Group's expertise across multiple technology disciplines into a single platform for public sector and large enterprise transformations. Building on our capabilities as sovereign cloud and HPC specialist, we bring generative AI, cybersecurity, professional and managed services expertise to enable national-scale program deployments across industries.
Responsibilities:
Objectives of this role: • Develop and deploy scalable and efficient computing platforms to support AI and HPC workloads, ensuring they meet performance, reliability, and security requirements.
• Continuously optimize system performance by tuning hardware configurations, software parameters, and network settings to maximize throughput and minimize latency for AI and HPC applications.
• Integrate various tools and technologies to streamline workflows, automate repetitive tasks, and enhance overall system efficiency and manageability.
• Implement monitoring solutions to track system health and performance, promptly identifying and resolving issues to ensure minimal downtime and optimal functionality.
• Work closely with data scientists, researchers, and developers to understand their needs, provide technical support, and make adjustments to the platform to accommodate evolving requirements.
Key Responsibilities: • Design, deploy, and maintain the underlying hardware and software infrastructure necessary for AI and HPC applications, ensuring it is scalable and robust.
• Monitor and optimize system performance by fine-tuning configurations, managing resources, and implementing best practices to achieve maximum efficiency.
• Develop and implement automation scripts and tools to streamline repetitive tasks, deployment processes, and system updates.
• Integrate various technologies, including cloud services, databases, and AI frameworks, to create cohesive and effective computing environments.
• Diagnose and resolve technical issues related to the platform, providing support to developers and data scientists to address performance bottlenecks and system failures.
• Ensure that the computing platform adheres to security standards and compliance requirements, implementing measures to protect data and infrastructure.
• Maintain detailed documentation of system configurations, processes, and procedures, and generate reports on system performance and resource utilization.
• Work closely with cross-functional teams, including data scientists, researchers, and software engineers, to understand their needs and provide solutions that support their objectives.
Qualifications:
Required skills and qualifications • A bachelor's degree in Computer Science, Engineering (such as Electrical or Software Engineering), Information Technology, or any related field.
• 5 or more years of experience in platform engineering, systems administration, or a related field, with a focus on high-performance computing or large-scale infrastructure management.
• Hands-on experience with AI or HPC environments, including managing and optimizing computational resources, is often required. This might involve working with HPC clusters, cloud computing platforms, or AI frameworks.
• Demonstrated experience with relevant technologies such as Linux/Unix systems, cloud platforms (e.g., AWS, Azure), scripting languages, and performance tuning tools.
• Proven track record of working on projects involving the design, implementation, and optimization of complex computing platforms, ideally with examples of successfully managed AI or HPC workloads.
Preferred skills and qualifications • Knowledge of security best practices and tools for protecting infrastructure and data, including experience with identity management and access controls.
• Strong analytical and troubleshooting skills to quickly identify and resolve technical issues that impact system performance or stability.
• Effective verbal and written communication skills for collaborating with cross-functional teams and documenting technical processes.
• Several years of experience in platform engineering, systems administration, or related
What we look for
If you are a performance-driven, inquisitive mind with the agility to adapt to ambiguity, you will fit right in. You should be eager to explore opportunities to build meaningful collaborations with stakeholders and aspire to create unique customer-centric solutions. Bias for action and a passion to conquer new frontiers in the AI space is at the heart of the Core42 community. What working at Core42 offers
•Culture: • An open, diverse and inclusive environment with a global vision that encourages personal growth and focuses on ground-breaking, industry-first innovations.
•Career: •Outstanding learning, development & growth opportunities via structured training programs and innovative, high-tech projects.
•Work-Life: • A hybrid work policy to strike the perfect balance between office and home.
•Rewards: • A competitive remuneration package with a host of perks including healthcare, education support, leave benefits and more. If you can confidently demonstrate that you meet the criteria above, please contact us as soon as possible.
MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.