Senior Site Reliability Engineer, Wikimedia Enterprise

Jobgether

Australia

1 week ago

Australia

1 week ago

Apply

Senior Site Reliability Engineer, Wikimedia Enterprise

Senior Site Reliability Engineer needed for Wikimedia Enterprise in Australia. Responsibilities include ensuring reliability of large-scale distributed systems, improving observability, and driving automation. Requires 5+ years SRE experience with IaC tools and cloud platforms.

Apply

Remote

Full-time

Senior

Terraform

Ansible

Salary

Not specified

Work Location

Australia, AU

Work Model

Remote

Experience Required

5 years

Employment Type

Full-time

Experience Level

Senior

Core Qualifications

Technical (Must-have)

TerraformAnsiblePythonGoAWSGCPAzureGitLabArgoCDPrometheusOpenTelemetry

Soft Skills

communicationcollaborationmentoringownership mindsetadaptabilitycontinuous improvement

Tools (Must-have)

GitLabArgoCDTerraformAnsiblePrometheusOpenTelemetry

Key Responsibilities

•Define, track, and continuously improve SLOs, SLIs, and error budgets for critical services
•Design and enhance observability systems including metrics, logging, and distributed tracing
•Participate in incident response, on-call rotations, and post-incident reviews
•Build and maintain CI/CD and GitOps pipelines enabling secure, automated, and reliable deployments
•Implement infrastructure-as-code and automation-first practices
•Design and operate scalable cloud infrastructure across production environments
•Drive capacity planning, performance optimization, and resilience testing
•Improve developer experience by enabling self-service infrastructure
•Collaborate with security, software, and release engineering teams
•Optimize infrastructure cost and efficiency using FinOps principles

Site Reliability EngineerSREInfrastructureCloudAWSGCPAzureTerraformPythonGo

Key Responsibilities

•Define, track, and continuously improve SLOs, SLIs, and error budgets for critical services

•Design and enhance observability systems including metrics, logging, and distributed tracing

•Participate in incident response, on-call rotations, and post-incident reviews

•Build and maintain CI/CD and GitOps pipelines enabling secure, automated, and reliable deployments

•Implement infrastructure-as-code and automation-first practices

•Design and operate scalable cloud infrastructure across production environments

•Drive capacity planning, performance optimization, and resilience testing

•Improve developer experience by enabling self-service infrastructure

•Collaborate with security, software, and release engineering teams

•Optimize infrastructure cost and efficiency using FinOps principles