
Mistral Cloud - Site Reliability Engineer
Mistral
Mistral Cloud - Site Reliability Engineer
Mistral AI is seeking highly experienced Site Reliability Engineers to shape the reliability, scalability and performance of their Cloud platform and customer facing applications. The role involves designing and maintaining scalable infrastructures, implementing monitoring and incident response systems, and collaborating with software engineers and product teams. Requires 5+ years of experience in a DevOps/SRE role and a Master’s degree in Computer Science or related field.
Mistral Cloud - Site Reliability Engineer
Mistral AI is seeking highly experienced Site Reliability Engineers to shape the reliability, scalability and performance of their Cloud platform and customer facing applications. The role involves designing and maintaining scalable infrastructures, implementing monitoring and incident response systems, and collaborating with software engineers and product teams. Requires 5+ years of experience in a DevOps/SRE role and a Master’s degree in Computer Science or related field.
Salary
Core Qualifications
Technical (Must-have)
Soft Skills
Tools (Must-have)
Preferred Qualifications
Technical (Nice-to-have)
Key Responsibilities
- Design, build, and maintain scalable, highly available and fault-tolerant infrastructures
- Operate systems and troubleshoot issues in production environments
- Implement and improve monitoring, alerting, and incident response systems
- Implement and maintain workflows and tools for customer-facing APIs and large training runs
- Participate occasionally in on-call rotations to respond to incidents
- Drive continuous improvement in infrastructure automation, deployment, and orchestration
- Collaborate with software engineers to develop and implement solutions for model-training experiments
- Help build a cloud platform offering an abstraction layer between science, engineering and infrastructure
- Design and develop new workflows and tooling to improve reliability, availability and performance
- Collaborate with the security team to ensure infrastructure adheres to best security practices
- Document processes and procedures to ensure consistency and knowledge sharing
- Contribute to open-source projects, research publications, blog articles and conferences