Senior Systems Architect Hpc

Abu Dhabi, AZ, AE, United Arab Emirates

Job Description

Overview:
The opportunity

The Senior Systems Architect specialises in High Performance Computing (HPC) infrastructure, leveraging proven experience in planning and executing complex projects. This pivotal role focuses on developing secure, cost-efficient, and scalable platform architecture blueprints applicable for both public and private cloud environments. Key capabilities required for this role include a deep understanding of architecture principles, as well as expertise in the features and integration capabilities of HPC platforms and solutions.

Core42 is the UAE's national-scale enabler for cloud and generative AI, combining G42 Group's expertise across multiple technology disciplines into a single platform for public sector and large enterprise transformations. Building on our capabilities as sovereign cloud and HPC specialist, we bring generative AI, cybersecurity, professional and managed services expertise to enable national-scale program deployments across industries.
Responsibilities:
• Collaborate with stakeholders to understand business requirements and translate them into technical solutions.
• Communicate architectural decisions and strategies to both technical and non-technical audiences. Preparing and delivering presentations on the proposed solutions.
• Prepare, review, and maintain high-level and low-level design documents, scope of work, RFIs, RFPs and RFQs.
• Ensure alignment of solutions with organizational goals and industry best practices.
• Create architectural blueprints and technical documentation for proposed solutions.
• Providing requirements for equipment specifications, estimating project labor efforts, and liaising with vendors on technical issues.
• Lead the deployment and configuration of HPC clusters, ensuring scalability, reliability, and performance according to project documentation and design specifications.
• Oversee the integration of HPC with existing systems and infrastructure.
• Ensure the solutions and environments adhere to security best practices and organizational policies.
• Stay updated with the latest trends and advancements in HPC technologies. Engage with vendors and community to stay informed about new features, updates, and best practices.
• Identify opportunities for process improvements and implement enhancements to the architecture.
• Evaluate and recommend new tools and technologies to enhance the HPC ecosystem.
• Engaging in pilot testing and commissioning activities, along with designing and conducting various types of tests: functional, load, and others.
• Maintain comprehensive documentation for the new and live HPC environments.
• Develop and deliver training sessions to engineering teams on HPC best practices and usage.


Qualifications:
To qualify for the role you must have • Bachelor's or Master's degree in Computer Science, Engineering, Software Engineering or related degree in a technology discipline.
• 7+ years of experience and deep expertise in designing, implementing, and managing private cloud stacks and 5+ of experience in designing large-scale HPC environments.
• Proven track record of successfully completing large-scale infrastructure projects with focus on HPC.
• Advanced knowledge and expertise in configuring, optimizing, and maintaining complex HPC environments, including hardware, software, and storage systems.
• Proficiency in parallel computing principles, distributed computing, and cluster management.
• Comprehensive knowledge and hands-on experience in the Linux environments.
• Experience with job schedulers, resource managers, and workflow orchestration tools commonly used in HPC environments ( Slurm, LSF or PBS, K8S )
• Advanced knowledge of Data Center network design and related technologies [OSI model, TCP/IP stack, routing, VLAN/VxLAN, etc].
• Competence in network design and configuration of switches/routers, including InfiniBand and RoCE.
• Experience with large-scale data storage solutions, particularly Ceph, NFS, and Lustre.
• Proficiency in one or more of the parallel libraries/languages such as MPI, OpenMP, OneAPI and CUDA.
• Competence in configuration management tools such as Ansible, Puppet, Terraform, and integration with Git.
• Excellent problem-solving skills and the ability to troubleshoot complex HPC issues effectively.
• In-depth knowledge of performance tuning and optimization techniques for HPC systems.
• Solid understanding of cloud computing principles (IaaS, PaaS, SaaS).
• Experience with Kubernetes and OpenShift, including designing, deploying, and managing Kubernetes and OpenShift clusters.
• Knowledge of AI/ML platforms (e.g. OpenShift AI, Kubeflow, MLFlow) is highly desirable
• Familiarity with AI/ML environments and the specific requirements for deploying AI/ML workloads on Kubernetes and OpenShift is highly desirable
• Experience with monitoring and observability (e.g. Prometheus, Grafana, Nagios, Zabbix, Ganglia, ELK)
• Understanding of both SQL and NoSQL database management and optimization
• Knowledge of and experience in using architectural frameworks and methodologies such as TOGAF and Zachman.
• Familiarity with Agile methodologies (Scrum or Kanban), and an understanding of DevOps principles.
• Excellent problem-solving and troubleshooting skills with a strong attention to detail

What we look for


If you are a performance-driven, inquisitive mind with the agility to adapt to ambiguity, you will fit right in. You should be eager to explore opportunities to build meaningful collaborations with stakeholders and aspire to create unique customer-centric solutions. Bias for action and a passion to conquer new frontiers in the AI space is at the heart of the Core42 community. What working at Core42 offers


•Culture: • An open, diverse and inclusive environment with a global vision that encourages personal growth and focuses on ground-breaking, industry-first innovations.

•Career: •Outstanding learning, development & growth opportunities via structured training programs and innovative, high-tech projects.

•Work-Life: •A hybrid work policy to strike the perfect balance between office and home.

•Rewards: •A competitive remuneration package with a host of perks including healthcare, education support, leave benefits and more. If you can confidently demonstrate that you meet the criteria above, please contact us as soon as possible.

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD1708949
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Contract
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Abu Dhabi, AZ, AE, United Arab Emirates
  • Education
    Not mentioned