Skills – Mandatory - Azure Fabric, Databricks, Data Modelling
Skills - Primary - PySpark, Databricks, SQL, Python
Skills - Good to have - Azure Data Factory,CICD, Azure Devops, Airflow
Qualification - Bachelor’s Degree
Total Experience 3-8 years
Relevant Experience 3 years
Work Location Cochin/TVM /Remote
Job Purpose
Own the end-to-end data lifecycle by collaborating with cross-functional teams in an Agile, sprint-based environment to achieve business goals.
Job Description / Duties & Responsibilities
Data Lifecycle Ownership:
• Design, build, and maintain scalable data pipelines to ingest raw data into our data lake (bronze layer).
• Develop and implement robust data models, transforming and enriching data through cleansing, validation (silver layer), and
aggregation to create business-ready datasets (gold layer).
• Establish and enforce data quality and governance standards across the entire data lifecycle.
• Optimize data platform performance for consumption by analytics, data science, and BI teams.
• Create and maintain comprehensive documentation for data pipelines, models, and architectures.
Agile Methodology & Collaboration:
• Actively participate in all Agile ceremonies, including sprint planning, daily stand-ups, reviews, and retrospectives.
• Work closely with Product Owners and stakeholders to understand business requirements and translate them into technical
specifications and data models.
• Collaborate with QA engineers to develop testing strategies, automate data quality checks, and resolve issues.
• Partner with software developers and platform engineers to ensure seamless integration of data sources and stability of the
data platform.
• Manage tasks, priorities, and timelines effectively within two-week sprint cycles, communicating progress and blockers
clearly.
• To adhere to the Information Security Management policies and procedures.
Job Specification / Skills and Competencies
Data Lifecycle Ownership (Fabric & Databricks):
• Design, build, and maintain scalable data ingestion pipelines using Azure Data Factory, Fabric Data Pipelines, and Databricks Notebooks (PySpark/SQL).
• Implement and manage a robust Medallion architecture (Bronze/Silver/Gold) across Azure Data Lake Storage (ADLS Gen2) and Fabric OneLake.
• Develop, optimize, and manage high-performance data models within Databricks (Delta Lake) and Fabric Lakehouses.
• Optimize Databricks cluster configurations and Fabric capacities for performance, scalability, and cost-efficiency.
Platform Automation & CI/CD:
• Implement and manage CI/CD pipelines for all data solutions using Azure DevOps (Repos, Pipelines).
• Automate the deployment of Databricks notebooks, Fabric artifacts (Lakehouses, pipelines, reports), and underlying Azure
resources (ARM/Bicep templates).
• Drive the data platform automation strategy, focusing on infrastructure as code (IaC), automated testing, and repeatable
release management.
• Monitor, troubleshoot, and optimize deployment pipelines for speed, reliability, and security.
Agile Methodology & Collaboration:
• Actively participate in all Agile ceremonies, including sprint planning, daily stand-ups, reviews, and retrospectives.
• Partner closely with Product Owners to understand business requirements and translate them into technical specifications
for Fabric and Databricks.
• Collaborate with QA engineers to integrate automated data quality testing into CI/CD pipelines.
• Work with Power BI developers and Data Scientists who consume the gold-layer data from Fabric, ensuring their deployment needs are met.
Required Qualifications
• 3+ years of proven experience as a Data Engineer with a strong focus on the Azure ecosystem.
• Deep, hands-on expertise with Azure Databricks (including PySpark, Spark SQL, Delta Lake).
• Demonstrable experience with Microsoft Fabric, including OneLake, Lakehouses, and Data Pipelines.
• Proven experience with Azure DevOps (Repos, Pipelines) to build and manage CI/CD pipelines for data workloads.
• Strong understanding of data platform automation and infrastructure as code (IaC) using ARM templates or Bicep.
• Strong proficiency in SQL and Python (PySpark).
• Solid understanding of data modelling principles and the Medallion architecture (bronze/silver/gold).
• Proven experience working in an Agile/Scrum development environment.
• Excellent problem-solving skills and a collaborative, team-first mindset.
Preferred Qualifications
• Previous domain experience in supply chain, logistics, or operations.
• Microsoft Certification: Azure Data Engineer (DP-203), Fabric Analytics Engineer (DP-600), or DevOps Engineer Expert (AZ-400).
• Experience with data streaming using Databricks Structured Streaming, Azure Event Hubs, or Fabric Real-Time Analytics