LogoLanguage
Prophaze Technologies (P) Ltd

4th Floor, Padmanabham, Technopark Campus, Kazhakuttom Trivandrum, Kerala 695581 , 695581

Data Analytics Engineer

Closing Date:31,Aug 2025
Job Published: 29,July 2025

Brief Description

About the Role
You will architect, build and maintain end-to-end data pipelines that ingest 100 GB+ of NGINX/web-server logs from Elasticsearch, transform them into high-quality features, and surface actionable insights and visualisations for security analysts and ML models. Acting as both a Data Engineer and a Behavioural Data Analyst, you will collaborate with security, AI and frontend teams to ensure low-latency data delivery, rich feature sets and compelling dashboards that spot anomalies in real time.

Key Responsibilities

ETL & Pipeline Engineering:

• Design and orchestrate scalable batch / near-real-time ETL
workflows to extract raw logs from Elasticsearch.
• Clean, normalise and partition logs for long-term storage
and fast retrieval.
• Optimise Elasticsearch indices, queries and retention policies
for performance and cost.

Feature Engineering & Feature Store:

• Assist in the development of robust feature-engineering
code in Python and/or PySpark.
• Define schemas and loaders for a feature store (Feast or
similar).
• Manage historical back-fills and real-time feature look-ups,
ensuring versioning and reproducibility.

Behaviour & Anomaly Analysis:

• Perform exploratory data analysis (EDA) to uncover traffic
patterns, bursts, outliers and security events across IPs,
headers, user agents and geo data.
• Translate findings into new or refined ML features and
anomaly indicators.

Visualisation & Dashboards:

• Create time-series, geo-distribution and behaviour-pattern
visualisations for internal dashboards.
• Partner with frontend engineers to test UI requirements.

Monitoring & Scaling:

• Implement health and latency monitoring for pipelines;
automate alerts and failure recovery.
• Scale infrastructure to support rapidly growing log volumes.

Collaboration & Documentation:

• Work closely with ML, security and product teams to align
data strategy with platform goals.
• Document data lineage, dictionaries, transformation logic
and behavioural assumptions.

Minimum Qualifications

• Education – Bachelor’s or Master’s in Computer Science, Data Engineering, Analytics,
Cybersecurity or related field.
• Experience – 3 + years building data pipelines and/or performing data analysis on
large log datasets.

 Core Skills

o Python (pandas, numpy, elasticsearch-py, Matplotlib, plotly, seaborn; PySpark
desirable)
o Elasticsearch & ELK stack query optimisation
o SQL for ad-hoc analysis
o Workflow orchestration (Apache Airflow, Prefect or similar)
o Data modelling, versioning and time-series handling
o Familiarity with visualisation tools (Kibana, Grafana).
   DevOps – Docker, Git, CI/CD best practices.

Nice-to-Have

• Kafka, Fluentd or Logstash experience for high-throughput log streaming.
• Web-server log expertise (NGINX / Apache, HTTP semantics)
• Cloud data platform deployment on AWS / GCP / Azure.
• Hands-on exposure to feature stores (Feast, Tecton) and MLOps.
• Prior work on anomaly-detection or cybersecurity analytics systems.

Why Join Us?

You’ll sit at the nexus of data engineering and behavioural analytics, turning raw traffic logs
into the lifeblood of a cutting-edge AI security product. If you thrive on building resilient
pipelines and diving into the data to uncover hidden patterns, we’d love to meet you.

Preferred Skills

Data analyst

ETL

Pipelines

Feature engineering

Data engineer

Visualizations

Matplotlib

Seaborn

Plotly

Kafka

Elasticsearch

Pyspark