Founding Engineer, ML Performance & Systems

Isotron AI

San Francisco, CA, USA

Published: 6/14/2022

Technology

Full Time

Job Description

Job DescriptionAbout the Role

We’re an early-stage stealth startup building a new kind of platform for generative media. Our mission is to enable the future of real-time generative applications: we’re building the foundational tools and infrastructure that make entirely new categories of generative experiences and applications finally possible.

We’re a small, focused team of ex-YC and unicorn founders and senior engineers with deep experience across 3D, generative video, developer platforms, and creative tools. We're backed by top-tier investors and top angels, and we're building a new technical foundation purpose-built for the next era of generative media.

We’re operating at the edge of what’s technically possible: high-performance inference and real-time orchestration of multimodal models. As one of our founding engineers, you’ll play a key role in architecting the core platform, shaping system design decisions, and owning critical infrastructure from day one.

If you're excited about architecting and building high-performance infrastructure that empowers the next generation of developers and unlocks entirely new products categories, we’d love to talk.

About the Role

We’re looking for a Founding Engineer, ML Performance & Systems with deep expertise in high-performance ML infrastructure. This is a highly technical, high-impact role focused on squeezing every drop of performance from real-time generative media models.

You’ll work across the model-serving stack, designing novel architectures, optimizing inference performance, and shaping Reactor’s competitive edge in ultra-low-latency, high-throughput environments

What You’ll Do

Drive our frontier position on real-time model performance for diffusion models
Design and implement a high-performance in-house inference engine
Focus on maximizing throughput and minimizing latency and resource usage
Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities

Requirements

About You

Strong foundation in systems programming, with a track record of identifying and resolving bottlenecks
Deep expertise in the ML infrastructure stack:

PyTorch, TensorRT, TransformerEngine, Nsight
Model compilation, quantization, and advanced serving architectures

Working knowledge of GPU hardware (NVIDIA) and the ability to dive deep into the stack as needed (e.g., writing custom GEMM kernels with CUTLASS)
Proficient in CUDA or willing to learn, with comparable experience in low-level accelerator programming
Excited by the frontier of multi-dimensional model parallelism (e.g., combining tensor, context, and sequence parallelism)
Familiarity with internals of cutting-edge techniques such as Ring Attention, FA3, and FusedMLP implementations

Minimum Qualifications

Expertise in systems programming (C++, CUDA)
Experience optimizing ML inference on GPUs
Proficient with PyTorch and tools like TensorRT
Deep understanding of NVIDIA GPU architecture
Familiar with model serving, compilation, and quantization

Benefits

Competitive SF salary and foundational team equity