Search

Site Reliability Engineer

Amiri Recruiting
locationMountain View, CA, USA
PublishedPublished: 6/14/2022
Technology
Full Time

Job Description

Job DescriptionSite Reliability EngineerOnsite- Bay Area, CA

Skills

Relevant Skills and Experience

What You’ll Do (Day-to-Day)

  • Own and manage our cloud infrastructure (GCP or AWS, on-prem).

  • Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters).

  • Implement and improve CI/CD pipelines (GitHub Actions).

  • Write and maintain Infrastructure as Code (Terraform).

  • Monitor system health and performance using Grafana and other observability tools.

  • Ensure high availability, reliability, and uptime across platforms.

  • Handle infrastructure maintenance, upgrades, and scaling.

  • Administer and improve our platform architecture and apply general security best practices across the stack.

Note: This is an internal-facing role — no customer interaction.

Must-Have:

  • 4+ years in SRE, DevOps, or Infrastructure Engineering

  • Solid experience with GCP or AWS (hybrid/on-prem a plus)

  • Experience with Kubernetes cluster management (GPU experience a bonus)

  • Hands-on with Terraform and CI/CD (GitHub)

  • Experience with monitoring/observability (Grafana, etc.)

  • Strong understanding of high availability and infrastructure reliability

  • Familiarity with platform/cluster architecture and administration

  • Security mindset and ability to apply best practice

Nice-to-Have:

  • Startup experience (you enjoy building, not just maintaining)

  • Experience with scalable GPU infrastructure for AI/ML

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...