Data Engineer

Robert Half

Washington, DC, USA

Published: 6/14/2022

Technology

Full Time

Job Description

We are looking for a skilled Software Developer to design, develop, and maintain middleware applications that drive business functionality and enhance user experiences. This long-term contract position requires expertise in creating scalable, high-performing solutions while collaborating with cross-functional teams to ensure seamless integration and operational readiness. Join our team in Philadelphia, Pennsylvania, to contribute to impactful projects and deliver innovative software solutions.

Job Description

We are seeking a talented Data Engineer to join a cutting-edge team supporting machine learning (ML) research efforts. This role focuses on building and optimizing data pipelines and infrastructure to enable more efficient ML model development and deployment. As a key collaborator with ML and NLP teams, you’ll work on infrastructure solutions for faster model building, optimization, and production deployment. If you're a hands-on engineer with strong cloud infrastructure experience, particularly with AWS, and a passion for enabling ML research, we want to hear from you!

Key Responsibilities

Design, build, and maintain data pipelines and ML infrastructure to support AI/ML research teams.
Collaborate with ML and NLP scientists to address content retrieval needs and build search-related solutions.
Create scalable solutions using technologies like Python, Airflow, Databricks, Kubernetes, and AWS.
Optimize Databricks notebooks using PySpark, structure ML pipelines, and assist with model production workflows.
Develop and deploy REST APIs for model serving and infrastructure communication.
Maintain and support Kubernetes clusters for model deployment in production environments.
Focus on content retrieval and database query solutions, contributing backend querying expertise (Java, Kotlin, Go).

Schedule: 4 days onsite 1 day remote

Key Technologies

Required:
Python (Microservices and Data Pipelines).
Kubernetes (Container orchestration and deployment).
Airflow (Designing and maintaining data pipelines).
AWS (Cloud infrastructure expertise).
Preferred (Not Required):
Databricks (Workbooks and ML pipeline development).
PySpark (Notebook optimization).
Pytorch (For process improvement/distillation).
Content Retrieval/Database Querying (Java, Kotlin, Go).

What We’re Looking For

Strong background in AWS infrastructure with the ability to hit the ground running.
Skills in Kubernetes, especially for deploying models in containerized environments.
Proficiency in Airflow for robust and scalable data pipeline development.
Demonstrated expertise in Python, especially for microservices and pipelines.
Ability to collaborate with ML teams without direct modeling responsibilities (this is ML-adjacent, focused on infrastructure).
Nice to Have: Experience with content retrieval, backend querying, and ML pipelines.

Disqualifiers

Candidates heavily focused on machine learning modeling roles (this position is centered on infrastructure).

Why Join Us?

Work closely with top-tier AI/ML research teams on cutting-edge initiatives.
Develop tools and infrastructure that drive innovation in AI and NLP.
Competitive compensation with benefits (health, vision, dental).