Staff Site Reliability Engineer
BrightStar Care
Bannockburn, IL 60015, USA
6/14/2022
Technology
Full Time
Job Description
Job DescriptionThe Staff Site Reliability Engineer pioneers solutions to guarantee reliability, performance, and the integrity of our digital health solutions. As a key member of our engineering leadership team, this position will play a pivotal role in shaping our technology strategy, architecture, standards, and priorities while providing technical leadership across multiple engineering teams.Responsibilities
- Cross-functional Leadership: Own the end-to-endoperational integrity of the platform, understanding and contributing to the bigger picture of the organization. Collaborate with cross-functional development and platform teams, providing expert-level guidance to deploy and maintain critical applications. Provide governance of our platform as a service environment.
- Observability and Reliability: Focus primarily on maintaining the reliability and scalability of production systems, employing techniques to manage service quality. Iteratively architect and implement cutting-edge solutions for application resiliency and fault tolerance.
- Drive automation and continuous improvement: Provide forward thinking on technology and innovative solutions with a strong emphasis on automation, eliminating manual operation, and enhancing operational excellence.
- Cloud Infrastructure Ownership: Architect, build, and manage highly available, scalable, and fault-tolerant cloud infrastructure on Microsoft Azure. Establish and enforce reliability standards (SLIs, SLOs, error budgets) for Azure-based platforms and shared services.
- Mentorship & Growth: Serve as a mentor and thought leader, coaching engineers across teams while fostering a culture of technical excellence, innovation, and continuous improvement.
- Engineering Practices: Integrate SRE principles directly into the software development lifecycle, guiding teams to a reliability culture. Advocate for the adoption of automated testing and observability practices to ensure high-quality and efficient delivery.
Required SkillsEDUCATION
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
REQUIRED SKILLS
- 10+ years of experience in Site Reliability Engineering, DevOps, or infrastructure engineering roles, with at least 2 years in a Principal or Staff Engineer position.
- Deep hands-on experience with .NET and core Azure Cloud services (e.g., App Services, Azure Functions, AKS).
- Strong hands-on experience building reliable, reusable infrastructure using Infrastructure as Code (Terraform, Bicep).
- Strong understanding of cloud cost management and optimization strategies.
- Solid understanding of SRE best practices, design patterns, and system integration.
- Strong troubleshooting skills across complex cloud infrastructure and production environments.
- Excellent communication and leadership skills, especially when dealing with complexity or ambiguity within platform and cross-functional environments.
- Proficiency with at least one programming or scripting language (e.g., Python, Go, PowerShell, Bash).
- Ability to influence and work in a collaborative cross-team environment.
- Proactive and ownership-oriented mindset.
PREFERRED SKILLS
- Experience in Healthcare technology, including clinical provider environments and patient engagement platforms.
- Experience with observability tools and performance tuning in production environments.
- Experience with backup, disaster recovery, and business continuity in cloud environments.
WORKING CONDITIONS
- Hybrid preferred in the greater Chicago area. Travel to the Bannockburn, IL office on a monthly basis.
- Work environment – Fast-paced, collaborative, and dynamic work environment with a focus on teamwork and meeting tight deadlines.
- Hours – 8am to 5pm Central Time; after hours work as needed, emergency on-call for security incidents.