Job Description
Senior Data Engineer (Spark Streaming & GCP)
Job Summary:
We are seeking a highly skilled Senior Data Engineer with strong expertise in Apache Spark, Streaming Technologies, and Google Cloud Platform (GCP) to design, build, and optimize scalable data pipelines supporting analytics, reporting, and machine learning workloads. The ideal candidate will have extensive experience developing both batch and real-time data processing solutions using Spark, Kafka, and cloud-native data services.
Required Experience
- 10+ years of overall IT experience
- 5+ years of recent hands-on GCP experience
- Strong experience building enterprise-scale data platforms and streaming architectures
Required Skills
- Strong programming skills in Python and SQL
- Hands-on expertise with Apache Spark (PySpark, Spark SQL, DataFrames, Spark Streaming)
- Experience with Kafka, Pub/Sub, Flink, or other streaming technologies
- Strong knowledge of BigQuery, GCS, Data Lakes, and Data Warehousing concepts
- Experience designing and developing ETL/ELT pipelines
- Data modeling and performance optimization experience
- Experience with Airflow for workflow orchestration
- Knowledge of Snowflake, Redshift, or other cloud data warehouses
- Experience implementing data quality, monitoring, and alerting solutions
Preferred Skills
- Scala or Java development experience
- Databricks experience
- Docker and Kubernetes
- CI/CD and DevOps practices
- Experience supporting ML/Data Science workloads
Responsibilities
- Design and develop scalable batch and real-time data pipelines using Spark and Kafka/PubSub
- Build and optimize BigQuery-based data platforms and lakehouse architectures
- Develop ETL/ELT frameworks for data ingestion, transformation, and delivery
- Optimize Spark jobs, SQL queries, and data workflows for performance and cost efficiency
- Implement data quality, monitoring, validation, and alerting mechanisms
- Collaborate with Data Scientists, Analysts, and Business Teams to deliver reliable data solutions
- Support production deployments and troubleshoot complex data engineering issues