Cloud Platform Engineer

search-tactics

New York, NY, USA

Published: 6/14/2022

Technology

Full Time

Job Description

Job Description

Monitor database and system performance using CloudWatch metrics, alarms, and logs; troubleshoot proactively.

Develop, deploy, and optimize AI/ML solutions using AWS AI services including SageMaker and Bedrock, supporting model training, inference, and integration into production systems.

Automate operational tasks using AWS Lambda, Systems Manager (SSM), and Infrastructure-as-Code tools such as CloudFormation or Terraform.

Design, build, and maintain scalable, fault-tolerant data processing and analytics workflows on AWS using services such as API Gateway, S3, EC2, RDS, Lambda, Glue, Athena, DynamoDB, EMR, Kinesis, DataSync.

Design and integrate agentic AI systems, including LLM-based agents, multi-agent workflows, and autonomous orchestration pipelines using frameworks such as LangChain and LangGraph.

Implement ETL/ELT pipelines and data architectures that support machine learning, analytics, and intelligent agent-based applications.

Support CI/CD pipelines for AI models and data workflows using Jenkins and container-based platforms such as ECS, EKS, or Kubernetes.

Apply security best practices across AI and data platforms, including IAM least-privilege access, encryption, audit logging, and compliance controls.

Maintain technical documentation for AI architectures, data pipelines, infrastructure configurations, and operational runbooks.

Required Skills

Minimum 7 years of hands-on AWS experience: EC2, RDS, S3, CloudWatch, CloudTrail, IAM, KMS, AWS Backup, and Lambda.

Minimum 7 years of experience in Linux/Unix administration and automation scripting (Bash, Shell, Python).

Minimum 7 years of experience with Infrastructure as Code (IaC) and automation tools, including CloudFormation, Terraform, and Ansible, for provisioning and maintaining.

Minimum 7 years of knowledge in AWS networking: VPC, subnets, NACLs, security groups, Route 53, and multi-AZ architectures.

Minimum 5 years of experience CI/CD pipelines, Jenkins, and IaC for deploying AI agents and ML models into production, monitoring autonomous workflows, and supporting MLOps using Kubernetes, ECS, or EKS.

Minimum 4 years of experience architecting, building, and maintaining scalable data processing workflows using AWS managed services and Python (including PySpark); strong understanding of data architecture and ETL/ELT patterns.

Minimum 4 years of experience working with AWS AI/ML services such as SageMaker, Bedrock, and vector databases (OpenSearch).

Strong understanding of machine learning algorithms, NLP concepts, and deep learning frameworks such as TensorFlow, PyTorch, or Hugging Face.