Job Description
Job Description
-
Monitor database and system performance using CloudWatch metrics, alarms, and logs; troubleshoot proactively.
-
Develop, deploy, and optimize AI/ML solutions using AWS AI services including SageMaker and Bedrock, supporting model training, inference, and integration into production systems.
-
Automate operational tasks using AWS Lambda, Systems Manager (SSM), and Infrastructure-as-Code tools such as CloudFormation or Terraform.
-
Design, build, and maintain scalable, fault-tolerant data processing and analytics workflows on AWS using services such as API Gateway, S3, EC2, RDS, Lambda, Glue, Athena, DynamoDB, EMR, Kinesis, DataSync.
-
Design and integrate agentic AI systems, including LLM-based agents, multi-agent workflows, and autonomous orchestration pipelines using frameworks such as LangChain and LangGraph.
-
Implement ETL/ELT pipelines and data architectures that support machine learning, analytics, and intelligent agent-based applications.
-
Support CI/CD pipelines for AI models and data workflows using Jenkins and container-based platforms such as ECS, EKS, or Kubernetes.
-
Apply security best practices across AI and data platforms, including IAM least-privilege access, encryption, audit logging, and compliance controls.
-
Maintain technical documentation for AI architectures, data pipelines, infrastructure configurations, and operational runbooks.
Required Skills
-
Minimum 7 years of hands-on AWS experience: EC2, RDS, S3, CloudWatch, CloudTrail, IAM, KMS, AWS Backup, and Lambda.
-
Minimum 7 years of experience in Linux/Unix administration and automation scripting (Bash, Shell, Python).
-
Minimum 7 years of experience with Infrastructure as Code (IaC) and automation tools, including CloudFormation, Terraform, and Ansible, for provisioning and maintaining.
-
Minimum 7 years of knowledge in AWS networking: VPC, subnets, NACLs, security groups, Route 53, and multi-AZ architectures.
-
Minimum 5 years of experience CI/CD pipelines, Jenkins, and IaC for deploying AI agents and ML models into production, monitoring autonomous workflows, and supporting MLOps using Kubernetes, ECS, or EKS.
-
Minimum 4 years of experience architecting, building, and maintaining scalable data processing workflows using AWS managed services and Python (including PySpark); strong understanding of data architecture and ETL/ELT patterns.
-
Minimum 4 years of experience working with AWS AI/ML services such as SageMaker, Bedrock, and vector databases (OpenSearch).
-
Strong understanding of machine learning algorithms, NLP concepts, and deep learning frameworks such as TensorFlow, PyTorch, or Hugging Face.