Prophaze Technologies (P) Ltd

4th Floor, Padmanabham, Technopark Campus, Kazhakuttom Trivandrum, Kerala 695581 , 695581

https://prophaze.com

Data Engineer

Closing Date:12,Aug 2025

Job Published: 11,July 2025

Contact Email: ammu.suresh@prophaze.com

Brief Description

Data Engineer
Experience Level: Mid-Senior (3+ years)

About the Role

We are seeking a skilled Data Engineer to design and implement robust, scalable data pipelines for processing and transforming log data stored in Elasticsearch. You will play a key role in building the data pipeline for our advanced ML-powered behavioural anomaly detection platform.

This role also involves designing and maintaining the feature engineering pipeline, including integration with a feature store like Feast, and ensuring high-quality, low-latency data delivery for ML models. If you have strong experience in ELK stack, Python, and modern data architectures, and are excited by the intersection of AI and cybersecurity, this is for you.

Key Responsibilities

ETL Pipeline Development:

Build scalable ETL workflows to extract raw logs from Elasticsearch.
Clean, normalize, and transform logs into structured features for ML use cases.
Maintain data freshness with either batch or near real-time workflows.

Feature Store Integration:

Design schemas for storing derived features into a feature store (e.g., Feast).
Collaborate with ML engineers to ensure features are aligned with model requirements.
Manage historical feature backfills and real-time lookups.

Data Infrastructure and Architecture:

Optimize Elasticsearch queries and index management for performance and cost.
Design data schema, partitioning, and retention policies for long-term storage.
Ensure data integrity, versioning, and reproducibility of transformed data.

Monitoring and Scaling:

Implement monitoring for pipeline performance and failures.
Scale pipelines to support growing log data (in the scale of 100s of GBs per day).

Collaboration:

Work closely with security analysts and AI engineers to translate behavioural insights into engineered features.
Document data lineage, transformation logic, and data dictionaries.

Minimum Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
3+ years of experience in data engineering roles with Python and Elasticsearch.
Strong experience building data pipelines using:
Python (pandas, elasticsearch-py, PySpark is a bonus)
Orchestration tools (e.g., Apache Airflow, Prefect)
Familiarity with log processing, especially NGINX, Apache logs, HTTP protocols, and cybersecurity-relevant fields (IP, headers, user agents).
Experience with feature stores such as Feast, Tecton, or custom-built systems.
Solid understanding of data modeling, versioning, and time-series data handling.
Knowledge of DevOps practices (Docker, Git, CI/CD workflows).

Nice to Have

Experience with Kafka, Fluentd or Logstash pipelines.
Experience deploying data workloads on cloud environments (AWS/GCP/Azure).
Exposure to anomaly detection or cybersecurity ML systems.
Familiarity with ML workflows, model deployment, and MLOps.

Preferred Skills

Python
Elasticsearch
Orchestration tools
Feast
Cloud