Job Description
Job Description
We need a senior DevOps and Observability Engineer, on contract. The role covers
infrastructure automation, CI/CD pipelines, cloud operations, and observability. You own both sides: build-and-deploy and the observability stack.
On-call rotation is part of the job, shared across the team.
Background check required.
Required Skills
• Infrastructure as Code and configuration management: Terraform for infrastructure,
Ansible for application deployment and config
• Scripting: Bash, Python, PowerShell
• CI/CD pipelines: GitHub Actions, including integration of SAST, DAST, secret
scanning, and dependency auditing without turning every deploy into a compliance
obstacle course
• Cloud platforms: AWS and/or Azure
• Cloud cost management: Infracost with pre-commit hooks, AWS Cost Explorer, AWS
Budgets, Azure Cost Management + Billing, Azure Budgets
• Identity and access management: IAM across cloud platforms; least privilege in
practice, not just on paper
• Networking fundamentals: VNETs, VPCs, subnets, routing, and enough
troubleshooting instinct to isolate a network problem without immediately filing a
ticket
• Log management: Splunk and/or ELK Stack (Elasticsearch, Logstash, Kibana)
• Observability and alerting: LogicMonitor
• Containerization: Docker, including writing efficient Dockerfiles and keeping images
lean; you know why bloated images are a problem and you don’t copy the internet’s
bad habits
• Version control: Git/GitHub
What You Bring
• 5 to 8 years in DevOps or platform engineering
• You build observability pipelines and alerting frameworks from scratch
• Cross-platform experience (Linux and Windows); Linux-first thinking is a plus
• You build tools, not workarounds
• You don’t need hand-holding to ship
• You’ve been on-call long enough to know the difference between an alert that
matters and one that wakes everyone up for nothing. When an incident hits, you
lead the call. Your post-mortems result in actual changes, not a doc nobody reads.
• You can connect SLO/SLA metrics to outcomes the business actually cares about
Nice to Have
• Kubernetes cluster administration
• Prometheus and/or Grafana (dashboard design and metrics pipelines)
• LogicMonitor Synthetics or similar uptime and synthetic