Position Overview
We are seeking an experienced Senior Voice AI Engineer to build the voice infrastructure for an intelligent conversational AI agent serving a US-based client. You will own the real-time voice layer - ensuring natural, low-latency voice interactions that feel human-like and responsive.
This is a hands-on technical role requiring deep expertise in speech technologies, real-time audio systems, and telephony integration. You should have proven experience building production voice systems that handle real user conversations at scale.
Key Responsibilities
Speech & Voice Pipeline
● Implement and optimize Speech-to-Text (STT) pipelines for accuracy, latency, and robustness
● Integrate and fine-tune Text-to-Speech (TTS) engines for natural prosody and appropriate tone
● Implement Voice Activity Detection (VAD) for accurate speech endpoint detection
● Handle interruptions, barge-in, and natural turn-taking in conversations
● Optimize for real-time performance with sub-500ms end-to-end latency
Real-Time Infrastructure
● Build low-latency audio streaming infrastructure using WebSockets/WebRTC
● Implement audio preprocessing (noise reduction, echo cancellation, normalization)
● Design resilient pipelines that handle network variability and audio quality issues
● Build connection management for concurrent voice sessions at scale
Telephony Integration
● Integrate with telephony platforms (Twilio, Vonage) for phone-based voice channels
● Handle call lifecycle management (inbound, outbound, transfers, hold)
● Implement DTMF handling and IVR fallback capabilities
● Support multiple audio codecs and telephony protocols
Quality & Optimization
● Establish metrics for voice quality (latency, Word Error Rate, naturalness)
● Build monitoring and alerting for real-time voice pipeline health
● Analyze call recordings to identify quality improvement opportunities
● Collaborate with the AI/Agent team on seamless voice-to-agent handoff

