Senior Site Reliability Engineer

Computer Task Group, Inc

Jersey City, NJ, USA

Published: 6/14/2022

Technology

Full Time

Job Description

Overview

CTG is seeking to fill a Senior Site Reliability Engineer opening for our client in Jersey City, NJ.

Location: Jersey City, NJ (100% Onsite)
Duration: 6 months

Duties:

Administer and optimize Kubernetes clusters (Amazon EKS and Red Hat OpenShift), including upgrades, scaling, and security controls.
Manage middleware platforms such as Apache Kafka, Redis Enterprise Clusters, and 3 Scale API Gateway.
Automate manual operations using Infrastructure-as-Code (IaC) and configuration management tools (Helm, ArgoCD, Terraform, Ansible, etc.).
Design and implement monitoring dashboards and alerts with Prometheus, Grafana, ELK, and Splunk.
Instrument distributed applications (Java, Node.js, Python) to meet SLOs with tracing, metrics, and logging.
Define SLIs/SLOs, manage error budgets, and lead incident response and root cause analysis.
Forecast capacity, monitor utilization, and tune performance of applications and clusters.
Implement container security and policy governance with tools such as OPA/Gatekeeper, Kyverno, Trivy, Clair, and Snyk.
Configure Kubernetes network segmentation (NetworkPolicy, Calico) to secure traffic and enforce reliability.

Skills:

Strong hands-on expertise with Kubernetes (EKS and/or OpenShift), Helm charts, and Operators.
Middleware expertise with Kafka and Redis.
IaC and automation proficiency with Terraform, Ansible, ArgoCD, Helm, or similar tools.
Advanced observability experience with Prometheus, Grafana, ELK/Splunk.
Programming and scripting skills in Python, Shell, or Groovy.
Proficiency in instrumenting distributed applications for observability.
Ability to enforce and maintain high reliability standards using SLO-driven frameworks.
Strong debugging, analytical, communication, and collaboration skills.

Experience:

12+ years overall industry experience.
6+ years in SRE, DevOps, Platform, or Production Engineering roles.
Proven track record of managing large-scale production systems with high availability.
Certification in EKS/OpenShift administration (CKA, AWS Certified Kubernetes Administrator, Red Hat Certified OpenShift Administrator, or equivalent) preferred.
Nice-to-have: experience with service mesh (Istio, Linkerd), chaos engineering (Chaos Monkey, LitmusChaos), regulated environment security/compliance, and API Gateway platforms (e.g., RedHat 3 Scale).

Education:

Bachelor’s degree in Computer Science, Information Technology, or a related field preferred. Equivalent work experience may be considered.

Excellent verbal and written English communication skills and the ability to interact professionally with a diverse group are required.

CTG does not accept unsolicited resumes from headhunters, recruitment agencies, or fee based recruitment services for this role.

To Apply:
To be considered, please apply directly to this requisition using the link provided. For additional information, please contact Rebecca Olan at Rebecca.Olan@ctg.com. Kindly forward this to any other interested parties. Thank you!

The expected base salary for this position ranges from $140,000 to $160,000. Salary offers are based on a wide range of factors including relevant skills, training, experience, education, market factors, and where applicable, licensure or certifications obtained. In addition to salary, a competitive benefit package is also offered.