Search

Senior Site Reliability Engineer

Computer Task Group, Inc
locationJersey City, NJ, USA
PublishedPublished: 6/14/2022
Technology
Full Time

Job Description

Job Description

Overview

CTG is seeking to fill a Senior Site Reliability Engineer opening for our client in Jersey City, NJ.

Location: Jersey City, NJ (100% Onsite)
Duration: 6 months

Duties:

  • Administer and optimize Kubernetes clusters (Amazon EKS and Red Hat OpenShift), including upgrades, scaling, and security controls.

  • Manage middleware platforms such as Apache Kafka, Redis Enterprise Clusters, and 3 Scale API Gateway.

  • Automate manual operations using Infrastructure-as-Code (IaC) and configuration management tools (Helm, ArgoCD, Terraform, Ansible, etc.).

  • Design and implement monitoring dashboards and alerts with Prometheus, Grafana, ELK, and Splunk.

  • Instrument distributed applications (Java, Node.js, Python) to meet SLOs with tracing, metrics, and logging.

  • Define SLIs/SLOs, manage error budgets, and lead incident response and root cause analysis.

  • Forecast capacity, monitor utilization, and tune performance of applications and clusters.

  • Implement container security and policy governance with tools such as OPA/Gatekeeper, Kyverno, Trivy, Clair, and Snyk.

  • Configure Kubernetes network segmentation (NetworkPolicy, Calico) to secure traffic and enforce reliability.

Skills:

  • Strong hands-on expertise with Kubernetes (EKS and/or OpenShift), Helm charts, and Operators.

  • Middleware expertise with Kafka and Redis.

  • IaC and automation proficiency with Terraform, Ansible, ArgoCD, Helm, or similar tools.

  • Advanced observability experience with Prometheus, Grafana, ELK/Splunk.

  • Programming and scripting skills in Python, Shell, or Groovy.

  • Proficiency in instrumenting distributed applications for observability.

  • Ability to enforce and maintain high reliability standards using SLO-driven frameworks.

  • Strong debugging, analytical, communication, and collaboration skills.

Experience:

  • 12+ years overall industry experience.

  • 6+ years in SRE, DevOps, Platform, or Production Engineering roles.

  • Proven track record of managing large-scale production systems with high availability.

  • Certification in EKS/OpenShift administration (CKA, AWS Certified Kubernetes Administrator, Red Hat Certified OpenShift Administrator, or equivalent) preferred.

  • Nice-to-have: experience with service mesh (Istio, Linkerd), chaos engineering (Chaos Monkey, LitmusChaos), regulated environment security/compliance, and API Gateway platforms (e.g., RedHat 3 Scale).

Education:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field preferred. Equivalent work experience may be considered.

Excellent verbal and written English communication skills and the ability to interact professionally with a diverse group are required.

CTG does not accept unsolicited resumes from headhunters, recruitment agencies, or fee based recruitment services for this role.

To Apply:
To be considered, please apply directly to this requisition using the link provided. For additional information, please contact Rebecca Olan at Rebecca.Olan@ctg.com. Kindly forward this to any other interested parties. Thank you!

The expected base salary for this position ranges from $140,000 to $160,000. Salary offers are based on a wide range of factors including relevant skills, training, experience, education, market factors, and where applicable, licensure or certifications obtained. In addition to salary, a competitive benefit package is also offered.

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...