
Mistral Cloud - Site Reliability Engineer
Mistral
Amsterdam
2 days ago
Mistral Cloud - Site Reliability Engineer
Mistral AI seeks an experienced Site Reliability Engineer to ensure the reliability, scalability, and performance of its Cloud platform. The role involves operations, development, and collaboration with software engineers. Candidates need 5+ years DevOps/SRE experience, strong infrastructure skills, and proficiency in scripting languages.
Hybrid
Full-time
Senior
Docker
Kubernetes
Salary
Not specified
Core Qualifications
Technical (Must-have)
DockerKubernetesTerraformCloudFormationPythonGoBashPrometheusGrafanaELK StackDatadog
Soft Skills
problem-solvingcommunicationself-motivatedteam collaboration
Preferred Qualifications
Technical (Nice-to-have)
SlurmFluidstackCoreweaveVast
Key Responsibilities
- Design, build, and maintain scalable, highly available and fault-tolerant infrastructures
- Operate systems and troubleshoot issues in production environments
- Implement and improve monitoring, alerting, and incident response systems
- Implement and maintain workflows and tools (CI/CD, containerization, orchestration, monitoring, logging and alerting systems)
- Participate in on-call rotations and perform root cause analysis
- Drive continuous improvement in infrastructure automation, deployment, and orchestration
- Collaborate with software engineers to develop solutions for safe and reproducible model-training experiments
- Help build a cloud platform offering an abstraction layer between science, engineering and infrastructure
- Design and develop new workflows and tooling to improve reliability, availability and performance
- Collaborate with security team to ensure best practices and compliance
- Document processes and procedures
- Contribute to open-source projects, research publications, blog articles and conferences
Site Reliability EngineerSRECloudKubernetesDockerTerraformPythonCI/CDObservabilityAI