Senior Machine Learning Engineer

h2o.ai

Austin, TX, USA

Published: 6/14/2022

Engineering

Full Time

Job Description

Founded in 2012, H2O.ai is on a mission to democratize AI. As the world’s leading agentic AI company, H2O.ai converges Generative and Predictive AI to help enterprises and public sector agencies develop purpose-built GenAI applications on their private data. Its open-source technology is trusted by over 20,000 organizations worldwide - including more than half of the Fortune 500 - H2O.ai powers AI transformation for companies like AT&T, Commonwealth Bank of Australia, Singtel, Chipotle, Workday, Progressive Insurance, and NIH.

H2O.ai partners include Dell Technologies, Deloitte, Ernst & Young (EY), NVIDIA, Snowflake, AWS, Google Cloud Platform (GCP) and VAST. H2O.ai’s AI for Good program supports nonprofit groups, foundations, and communities in advancing education, healthcare, and environmental conservation. With a vibrant community of 2 million data scientists worldwide, H2O.ai aims to co-create valuable AI applications for all users.

H2O.ai has raised $256 million from investors, including Commonwealth Bank, NVIDIA, Goldman Sachs, Wells Fargo, Capital One, Nexus Ventures and New York Life.

About This Opportunity

We are seeking a Senior Machine Learning Engineer with exceptional technical expertise in deploying, scaling, and maintaining production ML systems. This role requires a strong combination of software engineering skills, ML/AI knowledge, and system architecture experience to build robust, scalable machine learning infrastructure. The ideal candidate will have experience with end-to-end ML pipelines, modern MLOps practices, and the ability to bridge research and production environments.

What You Will Do

ML System Architecture & Development

Design and implement end-to-end machine learning pipelines from research to production
Build scalable ML infrastructure supporting multiple models and high-throughput inference
Develop automated systems for model training, validation, deployment, and monitoring
Create efficient data processing pipelines with multiprocessing optimization and performance tuning
Architect feature stores, model registries, and ML metadata management systems

Production ML Operations

Deploy and maintain production ML models with focus on reliability, scalability, and performance Implement MLOps best practices including CI/CD for ML, automated testing, and model versioning
Monitor model performance, data drift, and system health in production environments
Optimize model inference for latency and throughput requirements
Manage model lifecycle including retraining, rollback, and A/B testing strategies

Advanced ML Implementation

Implement cutting-edge ML techniques including generative AI, diffusion models, and large
language models
Develop and optimize deep learning models using modern frameworks (TensorFlow,
PyTorch)Build systems for handling multimodal data (text, images, video, time-series)
Create solutions for challenging ML problems including out-of-distribution detection and feature alignment
Implement efficient algorithms achieving significant performance improvements (orders of
magnitude speedups)

Technical Leadership & Collaboration

Lead technical design reviews and architecture decisions for ML systems
Mentor junior engineers and data scientists on ML engineering best practices
Collaborate with research teams to transition experimental models to production
Work with infrastructure teams to ensure optimal resource utilization and scaling
Provide technical guidance on complex ML system design and implementation

What We Are Looking For

Education & Experience

Master's degree in Computer Science, Engineering, Physics, Mathematics, or related
technical field
7+ years of experience in machine learning engineering, software development, or related roles
5+ years of experience building and deploying production ML systems
Proven track record of leading technical projects and mentoring team members

Core Programming & ML

Expert-level proficiency in Python with strong knowledge of Bash, SQL, C/C++
Deep experience with ML frameworks: TensorFlow, PyTorch, Scikit-learn
Extensive experience with data processing libraries: NumPy, Pandas, Matplotlib
Hands-on experience with Hugging Face ecosystem and modern NLP/LLM tools

ML Ops & Infrastructure

Strong experience with containerization and orchestration: Docker, Kubernetes
Knowledge of cloud platforms: AWS, GCP, Azure and their ML services
Experience with MLworkflow orchestration tools: Airflow, Kubeflow, MLflow
Proficiency in Infrastructure as Code: Terraform, CloudFormation
Experience with monitoring and observability tools: Prometheus, Grafana, ELK stack

Advanced ML Technologies

Proven expertise in generative AI including diffusion models, GANs, VAEs, and normalizing flows
Experience with large language models (LLMs) and agentic AI systems
Knowledge of advanced architectures: CNNs, U-Nets, transformers, and attention
mechanisms
Experience with model optimization techniques: quantization, pruning, distillation
Understanding of distributed training and inference systems

Software Engineering

Strong software development practices including version control, testing, and code review
Experience with micro services architecture and API development
Knowledge of database systems and data storage solutions
Understanding of distributed systems and concurrent programming
Experience with performance profiling and optimization

System Design & Architecture

Experience designing large-scale ML systems and data pipelines
Knowledge of real-time and batch processing architectures
Understanding of model serving patterns and inference optimization
Experience with auto-scaling and resource management in production environments
Knowledge of security best practices for ML systems

Problem-Solving & Innovation

Track record of solving complex technical problems with innovative engineering solutions
Experience working with real-world, noisy datasets across multiple domains
Ability to achieve significant performance improvements and system optimizations
Strong debugging and troubleshooting skills for production ML systems
Experience with A/B testing and experimentation frameworks

How to Stand Out From the Crowd

PhD in Computer Science, Engineering, Physics, Mathematics, or related quantitative field
Deep background in computational sciences (astrophysics, physics, computational biology)
Experience in technology companies with large-scale ML infrastructure
Knowledge of financial services, healthcare, or other regulated industries
Background in research environments with transition to production systems
Experience building and deploying LLM applications and chatbot systems
Background in computer vision and image processing applications
Knowledge of time-series analysis and forecasting systems
Experience with automated content generation and summarization systems
Understanding of federated learning and privacy-preserving ML techniques

Technical Specializations

Experience with edge deployment and model optimization for mobile/IoT devices
Knowledge of multi-cloud and hybrid cloud architectures
Background in streaming data processing and real-time ML systems
Experience with graph neural networks and knowledge graphs
Understanding of reinforcement learning and multi-agent systems

Leadership & Communication

Experience mentoring engineering teams and establishing technical standards
Strong project management skills with experience in Agile/Scrum methodologies
Ability to communicate complex technical concepts to diverse audiences
Experience with technical writing and documentation
Track record of driving technical innovation and process improvements

Success Metrics

System uptime and reliability of production ML services
Model performance and accuracy in production environments
Deployment velocity and time-to-production for new modelsResource utilization efficiency and cost optimization
Team productivity and knowledge sharing initiatives
Technical innovation and patent applications

Technical Environment

Access to cutting-edge ML infrastructure and computing resources
Opportunity to work with the latest ML frameworks and tools
Collaborative environment with research and product teams
Support for experimentation and technical innovation
Flexible architecture allowing for rapid prototyping and iteration

Why H2O.ai?

Market leader in total rewards
Remote-friendly culture
Flexible working environment
Be part of a world-class team
Career growth

H2O.ai is committed to creating a diverse and inclusive culture. All qualified applicants will receive consideration for employment without regard to their race, ethnicity, religion, gender, sexual orientation, age, disability status or any other legally protected basis.

H2O.ai is an innovative AI cloud platform company, leading the mission to democratize AI for everyone. Thousands of organizations from all over the world have used our cutting-edge technology across a variety of industries. We’ve made it easy for people at all levels to generate breakthrough solutions to complex business problems and advance the discovery of new ideas and revenue streams. We push the boundaries of what is possible with artificial intelligence.

H2O.ai employs the world’s top Kaggle Grandmasters, the community of best-in-the-world machine learning practitioners and data scientists. A strong AI for Good ethos and responsible AI drive the company’s purpose.

Please visit www.H2O.ai to learn more.

o6pnUxfjsZ