HPC cluster systems engineer is responsible for managing and supporting all HPC systems and Grid system, for the University data center and distributed locations. Duties and Responsibilities: o Solves HPC and Grid related problems on a daily basis. o In support of change management within the data center, provides the CSC with information about the HPC systems. o Daily verifies all HPC Systems by using the monitoring tools and proactively intervenes to solve problems. o Analyze solutions components, understand systems integration challenges and identify technology gaps. o Resolve / propose solutions to above gaps to reach future performance targets and functionality requirements. o Prototype features and perform integration checkout of various software components, and collaborate with component developers and solutions architects. o Develop / drive validation test content and evaluate systems components. o Engage with industry partners as required to identify and investigate best-known methods used in the HPC community and apply those methods. o Collaborate with architects and developers to define architectural requirements for high-end HPC clusters. o Responsible for system integration and validation of UAEU HPC clusters. o Responsible of monitoring all HPC and Grid services. o Co-ordinates work with vendors for support. o Tests and deploys HPC systems. o Knowledge of IT Service Management frameworks. o Maintains accurate and comprehensive documentation diagrams of the enterprise HPC system, backup infrastructure, communications flow, and routing. o Other duties as assigned.
Minimum Qualification
• Bachelor degree required in Computer Engineering/Science o 3+ years of experience with software development in Linux o 3+ years of experience with HPC clusters and systems integration
Preferred Qualification
• Knowledge of server hardware components, diagnostics and replacing them defective items. o Good communication skills & Report Writing Skills. o Must be able to work under pressure in a fast-paced work environment. o Must be able to work flexible hours including evenings, weekends, holidays and overtime as required, should be available 24/7 on-call in case of major services outage. o Strong problem solving, testing, and network troubleshooting skills o Cluster solutions integration and administration o Linux operating systems and OS components for HPC clusters o Cluster provisioning, systems management, resource management middleware o Cluster interconnect fabrics and software stack o HPC Cluster storage solutions o Parallel programming models for HPC clusters
Expected Skills
• Knowledge of server hardware components, diagnostics and replacing them defective items. o Good communication skills & Report Writing Skills. o Must be able to work under pressure in a fast-paced work environment. o Must be able to work flexible hours including evenings, weekends, holidays and overtime as required, should be available 24/7 on-call in case of major services outage. o Strong problem solving, testing, and network troubleshooting skills o Cluster solutions integration and administration o Linux operating systems and OS components for HPC clusters o Cluster provisioning, systems management, resource management middleware o Cluster interconnect fabrics and software stack o HPC Cluster storage solutions o Parallel programming models for HPC clusters
Close Date Kindly apply before the closing date.
Open until filled
MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.