Senior Site Reliability Engineer

Localisation: Worldwide

Who we are

Gretel's mission is to automate privacy engineering. We enable developers, researchers, and scientists to quickly create safe versions of data that can be used for pre-production environments, machine learning workloads, and be shared across teams and organizations.

As a Site Reliability Engineer (SRE) at Gretel you will ensure the safety, security, and reliability of our cloud infrastructure. This includes our compute infrastructure, container orchestration platform, deployment pipelines, and observability stack.

What you will do

Build and maintain Gretel's observability stack. Measure and monitor Gretel's availability, latency, and overall system health
Scale systems sustainably with automation and continuously improve and evolve systems
Manage and lead incident response, recovery, and blameless postmortems
Partner with software engineers to troubleshoot production issues
Build tools and frameworks that help Gretel engineers be more productive
Ship complex ML/AI models in partnership with Gretel's applied science and engineering teams

Minimum Qualifications

Experience with at least one cloud platform (we use AWS heavily)
Experience with Docker and Kubernetes
Ability to write software and tools in Python or Go
Experience with monitoring, alerting and operations
Experience operating highly available distributed systems in the cloud
Experience identifying, diagnosing, and responding to operational outages

Preferred Qualifications

Experience with infrastructure as code (Terraform, CloudFormation, etc)
Experience with build systems such as Bazel
Experiencing shipping application with complex dependencies (Pytorch, Tensorflow)
Software engineering skills beyond script writing (TDD, design patterns, etc)
Experience with DevOps or CI/CD pipelines

POSTULER POSTULER

D'autres postes #sre

RECRUT-INFO

DevOps / SRE / Cloud Engineer [Full Remote possible] F/H

Rattaché·e au CTO, vous interviendrez sur toutes les tâches DevOps / Cloud / Systèmes et en deviendrez la personne référente sur ces sujets au sein de l'équipe constituée de 6 personnes. Vous serez a…

Salaire: 45 - 60 k€ brut annuel
Localisation: Marseille 09 - 13

Seyos

Senior SRE DevOps / Full remote - F/H

Notre client est un éditeur de logiciels RH qui compte plus de 1 500 clients et 250 000 utilisateurs. Leur métier : automatiser les process administratifs et RH des PME et ETI : gestion des congés, n…

Salaire: 55 - 90 k€ brut annuel
Localisation: Paris 13 - 75

Skill Hunter

Backend / SRE Node JS F/H

Vous souhaitez évoluer dans l’équipe Française d’un leader mondial dans le monde de la blockchain ? Cette entreprise qui existe depuis 2018 compte aujourd’hui plus de 7 000 personnes à travers le mo…

Salaire: 120 - 150 k€ brut annuel
Localisation: Paris 06 - 75

SAS RECHERCHES ET HORIZONS

Site Reliability Engineer F/H

Au sein d'une équipe agile, vous travaillez sur un produit générant un grand nombre de requêtes quotidiennes. Vous fournissez l'outillage et êtes en charge de l'automatisation afin d'optimiser le Run…

Salaire: 40 - 55 k€ brut annuel
Localisation: Lambesc - 13

En voir d'autres