Search

Founding Engineer, ML Performance & Systems

Isotron AI
locationSan Francisco, CA, USA
PublishedPublished: 6/14/2022
Technology
Full Time

Job Description

Job DescriptionAbout the Role

We’re an early-stage stealth startup building a new kind of platform for generative media. Our mission is to enable the future of real-time generative applications: we’re building the foundational tools and infrastructure that make entirely new categories of generative experiences and applications finally possible.

We’re a small, focused team of ex-YC and unicorn founders and senior engineers with deep experience across 3D, generative video, developer platforms, and creative tools. We're backed by top-tier investors and top angels, and we're building a new technical foundation purpose-built for the next era of generative media.

We’re operating at the edge of what’s technically possible: high-performance inference and real-time orchestration of multimodal models. As one of our founding engineers, you’ll play a key role in architecting the core platform, shaping system design decisions, and owning critical infrastructure from day one.

If you're excited about architecting and building high-performance infrastructure that empowers the next generation of developers and unlocks entirely new products categories, we’d love to talk.

About the Role

We’re looking for a Founding Engineer, ML Performance & Systems with deep expertise in high-performance ML infrastructure. This is a highly technical, high-impact role focused on squeezing every drop of performance from real-time generative media models.

You’ll work across the model-serving stack, designing novel architectures, optimizing inference performance, and shaping Reactor’s competitive edge in ultra-low-latency, high-throughput environments

What You’ll Do

  • Drive our frontier position on real-time model performance for diffusion models
  • Design and implement a high-performance in-house inference engine
  • Focus on maximizing throughput and minimizing latency and resource usage
  • Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities

Requirements

About You

  • Strong foundation in systems programming, with a track record of identifying and resolving bottlenecks
  • Deep expertise in the ML infrastructure stack:
    • PyTorch, TensorRT, TransformerEngine, Nsight
    • Model compilation, quantization, and advanced serving architectures
  • Working knowledge of GPU hardware (NVIDIA) and the ability to dive deep into the stack as needed (e.g., writing custom GEMM kernels with CUTLASS)
  • Proficient in CUDA or willing to learn, with comparable experience in low-level accelerator programming
  • Excited by the frontier of multi-dimensional model parallelism (e.g., combining tensor, context, and sequence parallelism)
  • Familiarity with internals of cutting-edge techniques such as Ring Attention, FA3, and FusedMLP implementations

Minimum Qualifications

  • Expertise in systems programming (C++, CUDA)
  • Experience optimizing ML inference on GPUs
  • Proficient with PyTorch and tools like TensorRT
  • Deep understanding of NVIDIA GPU architecture
  • Familiar with model serving, compilation, and quantization

Benefits

  • Competitive SF salary and foundational team equity
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...