About the Role
You will architect, build and maintain end-to-end data pipelines that ingest 100 GB+ of NGINX/web-server logs from Elasticsearch, transform them into high-quality features, and surface actionable insights and visualisations for security analysts and ML models. Acting as both a Data Engineer and a Behavioural Data Analyst, you will collaborate with security, AI and frontend teams to ensure low-latency data delivery, rich feature sets and compelling dashboards that spot anomalies in real time.
Key Responsibilities
ETL & Pipeline Engineering:
• Design and orchestrate scalable batch / near-real-time ETL
workflows to extract raw logs from Elasticsearch.
• Clean, normalise and partition logs for long-term storage
and fast retrieval.
• Optimise Elasticsearch indices, queries and retention policies
for performance and cost.
Feature Engineering & Feature Store:
• Assist in the development of robust feature-engineering
code in Python and/or PySpark.
• Define schemas and loaders for a feature store (Feast or
similar).
• Manage historical back-fills and real-time feature look-ups,
ensuring versioning and reproducibility.
Behaviour & Anomaly Analysis:
• Perform exploratory data analysis (EDA) to uncover traffic
patterns, bursts, outliers and security events across IPs,
headers, user agents and geo data.
• Translate findings into new or refined ML features and
anomaly indicators.
Visualisation & Dashboards:
• Create time-series, geo-distribution and behaviour-pattern
visualisations for internal dashboards.
• Partner with frontend engineers to test UI requirements.
Monitoring & Scaling:
• Implement health and latency monitoring for pipelines;
automate alerts and failure recovery.
• Scale infrastructure to support rapidly growing log volumes.
Collaboration & Documentation:
• Work closely with ML, security and product teams to align
data strategy with platform goals.
• Document data lineage, dictionaries, transformation logic
and behavioural assumptions.
Minimum Qualifications
• Education – Bachelor’s or Master’s in Computer Science, Data Engineering, Analytics,
Cybersecurity or related field.
• Experience – 3 + years building data pipelines and/or performing data analysis on
large log datasets.
Core Skills
o Python (pandas, numpy, elasticsearch-py, Matplotlib, plotly, seaborn; PySpark
desirable)
o Elasticsearch & ELK stack query optimisation
o SQL for ad-hoc analysis
o Workflow orchestration (Apache Airflow, Prefect or similar)
o Data modelling, versioning and time-series handling
o Familiarity with visualisation tools (Kibana, Grafana).
DevOps – Docker, Git, CI/CD best practices.
Nice-to-Have
• Kafka, Fluentd or Logstash experience for high-throughput log streaming.
• Web-server log expertise (NGINX / Apache, HTTP semantics)
• Cloud data platform deployment on AWS / GCP / Azure.
• Hands-on exposure to feature stores (Feast, Tecton) and MLOps.
• Prior work on anomaly-detection or cybersecurity analytics systems.
Why Join Us?
You’ll sit at the nexus of data engineering and behavioural analytics, turning raw traffic logs
into the lifeblood of a cutting-edge AI security product. If you thrive on building resilient
pipelines and diving into the data to uncover hidden patterns, we’d love to meet you.