Search

Software Data Engineer

Link Consulting Services
locationAlpharetta, GA, USA
PublishedPublished: 6/14/2022
Technology
Full Time

Job Description

Job DescriptionSoftware Data Engineer
Expected start date: Dec 1st
Duration of engagement: 3 months, with potential extension
Work location: Alpharetta, GA or Plano TX.
Work location model: Hybrid, 3-days in officeAbout the Role:
The ideal candidate will be responsible for designing and maintaining modern, scalable data solutions on Azure using Databricks. This includes building data pipelines, ETL/ELT workflows, and architectures such as Data Lakes, Warehouses, and Lakehouses for both real-time and batch processing. The role involves integrating large datasets from diverse sources, implementing Delta Lake, and preparing data for machine learning through feature stores.
Key Responsibilities:

  • Design, develop, and optimize scalable data pipelines and ETL/ELT workflows using Databricks on Azure
  • Build and maintain modern data architectures (Data Lake, Data Warehouse, Lakehouse) for real-time streaming and batch processing on Azure
  • Implement data integration solutions for large-scale datasets across diverse data sources using Delta Lake and other data formats
  • Create feature stores and data preparation workflows for machine learning applications on Azure
  • Develop and maintain data quality frameworks and implement data validation checks
  • Collaborate with data scientists, ML engineers, analysts, and business stakeholders to deliver high-quality, production-ready data solutions
  • Monitor, troubleshoot, and optimize data workflows for performance, costefficiency, and reliability
  • Implement data governance, security, and compliance standards across all data processes
  • Create and maintain comprehensive technical documentation for data pipelines and architectures


Required Qualifications:

  • Data Architecture: Deep understanding of Data Lake, Data Warehouse, and Lakehouse concepts with hands-on implementation experience
  • Databricks & Spark: 3+ years of hands-on experience with Databricks on Azure, Apache Spark (PySpark/Spark SQL), Delta Lake optimization
  • Azure Platform: 3+ years working with Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics, Azure ML Studio, Azure Databricks
  • Programming: Strong proficiency in Python (including pandas, NumPy), SQL, and Unix/Linux shell scripting; experience with Java or Scala is a plus
  • Streaming: 3+ years’ experience with Apache Kafka or Azure Event Hubs, Azure Stream Analytics
  • DevOps: Hands-on experience with Git, CI/CD pipelines (Azure DevOps, GitHub Actions), and build tools (Maven, Gradle)
  • Orchestration: Working knowledge of workflow schedulers (Apache Airflow, Azure Data Factory, Databricks Workflows, TWS)
  • Problem-solving: Strong analytical and debugging skills with ability to work in agile/scrum environments

Preferred Qualifications:

  • Experience with ML frameworks and libraries (scikit-learn, TensorFlow, PyTorch) for data preparation and feature engineering on Azure
  • Experience with vector databases (Azure AI Search, Pinecone, Weaviate, Milvus) and RAG (Retrieval Augmented Generation) architectures
  • Experience with modern data transformation tools (DBT, Spark Structured Streaming on Databricks)
  • Understanding of LLM applications, prompt engineering, and AI agent frameworks (Azure OpenAI Service, Semantic Kernel)
  • Familiarity with containerization (Docker, Azure Kubernetes Service)
  • Experience with monitoring and observability tools (Azure Monitor, Application, Insights, Datadog, Grafana)
  • Certifications in Databricks, Azure Data Engineer Associate, Azure AI Engineer, or Azure Solutions Architect


Educational Background:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...