Job Description
Job DescriptionAbout Amigo
Amigo builds trust and safety infrastructure for AI in mission-critical environments.
We partner with organizations in healthcare and other regulated sectors to deploy AI systems that operate reliably when the stakes are highest. Our infrastructure enables verification, monitoring, and real-time oversight—ensuring AI serves people safely at scale.
We've raised $6.5M from General Catalyst and GSV Ventures. Our team combines expertise in distributed systems, quantitative research, clinical operations, and regulatory environments to build AI that organizations can trust.
About this role
As a Staff Software Engineer (Observability) at Amigo, you'll build the monitoring, logging, and debugging infrastructure that ensures our AI agents operate reliably and transparently. You'll design systems that provide visibility into our platform's behavior, enabling our team to maintain reliability and quickly diagnose issues that arise.
What you'll do
-
Design and implement observability infrastructure across the entire platform
-
Build real-time monitoring systems that detect anomalies before they impact patient care
-
Create advanced debugging tools for complex distributed systems and AI model behavior
-
Implement distributed tracing systems that track requests across services
-
Design alerting systems that minimize false positives while catching all critical issues
-
Build dashboards and analytics tools that provide insights into system performance and health
-
Implement log aggregation and analysis systems for compliance and debugging
-
Create performance profiling tools for identifying bottlenecks in AI inference pipelines
-
Design systems for monitoring AI model drift and behavior changes over time
-
Build chaos engineering tools to test system resilience and failure modes
What we're looking for
-
7+ years of experience building observability and monitoring systems
-
Deep expertise with observability and distributed tracing tools
-
Strong experience with distributed systems and service architectures
-
Experience building monitoring for complex distributed systems and application performance
-
Knowledge of statistical analysis and anomaly detection techniques
-
Strong programming skills in multiple languages
-
Experience with time series databases and analytics
-
Understanding of SRE principles and practices
-
Experience with performance profiling and optimization
-
Strong debugging skills for complex distributed systems
Nice to have
-
Experience in healthcare, finance, or other regulated industries
-
Background with statistical monitoring and performance optimization
-
Experience with compliance monitoring and audit logging
-
Knowledge of healthcare data privacy and security requirements
BenefitsHealth & Wellness
-
Comprehensive health, dental, and vision insurance
-
Mental health support and wellness coaching
-
Flexible wellness stipend for fitness, therapy, or personal growth
-
Daily catered lunch and dinner
Growth & Development
-
Annual learning budget for courses, books, or conferences
-
Conference attendance budget for professional development
-
Development setup of your choice
-
Academic collaboration opportunities
Compensation Range: $220K - $260K