Founding Engineer, ML Performance & Systems
Job Description
Job DescriptionAbout the Role
We’re an early-stage stealth startup building a new kind of platform for generative media. Our mission is to enable the future of real-time generative applications: we’re building the foundational tools and infrastructure that make entirely new categories of generative experiences and applications finally possible.
We’re a small, focused team of ex-YC and unicorn founders and senior engineers with deep experience across 3D, generative video, developer platforms, and creative tools. We're backed by top-tier investors and top angels, and we're building a new technical foundation purpose-built for the next era of generative media.
We’re operating at the edge of what’s technically possible: high-performance inference and real-time orchestration of multimodal models. As one of our founding engineers, you’ll play a key role in architecting the core platform, shaping system design decisions, and owning critical infrastructure from day one.
If you're excited about architecting and building high-performance infrastructure that empowers the next generation of developers and unlocks entirely new products categories, we’d love to talk.
About the Role
We’re looking for a Founding Engineer, ML Performance & Systems with deep expertise in high-performance ML infrastructure. This is a highly technical, high-impact role focused on squeezing every drop of performance from real-time generative media models.
You’ll work across the model-serving stack, designing novel architectures, optimizing inference performance, and shaping Reactor’s competitive edge in ultra-low-latency, high-throughput environments
What You’ll Do
- Drive our frontier position on real-time model performance for diffusion models
- Design and implement a high-performance in-house inference engine
- Focus on maximizing throughput and minimizing latency and resource usage
- Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities
Requirements
About You
- Strong foundation in systems programming, with a track record of identifying and resolving bottlenecks
- Deep expertise in the ML infrastructure stack:
- PyTorch, TensorRT, TransformerEngine, Nsight
- Model compilation, quantization, and advanced serving architectures
- Working knowledge of GPU hardware (NVIDIA) and the ability to dive deep into the stack as needed (e.g., writing custom GEMM kernels with CUTLASS)
- Proficient in CUDA or willing to learn, with comparable experience in low-level accelerator programming
- Excited by the frontier of multi-dimensional model parallelism (e.g., combining tensor, context, and sequence parallelism)
- Familiarity with internals of cutting-edge techniques such as Ring Attention, FA3, and FusedMLP implementations
Minimum Qualifications
- Expertise in systems programming (C++, CUDA)
- Experience optimizing ML inference on GPUs
- Proficient with PyTorch and tools like TensorRT
- Deep understanding of NVIDIA GPU architecture
- Familiar with model serving, compilation, and quantization
Benefits
- Competitive SF salary and foundational team equity