We are building a strong DevOps, MLOps, and DataOps team to test, validate, and deploy production-grade Computer Vision and AI pipelines. The team's mission is to handle pipelines running on a wide range of hardware, from milliwatt edge devices like Jetson Orin and Hailo-8/15 to multi-GPU data-center nodes. As a member of this team, you will need to master DevOps, MLOps, and DataOps while maintaining hardware awareness. This is a senior-level position, and you will report to the CTO.
Purpose of the Role
The primary purpose of this role is to take the AI/ML/Computer-Vision artifacts produced by the core R&D group and guide them to production. This involves:
● Functional testing, including unit, integration, regression, bias, fairness, and explainability tests.
● Validation against model-specific and pipeline-specific KPIs such as mAP, precision-recall, perplexity, latency, and throughput.
● Owning the path to production, which includes containerization, CI/CD, cloud deployment, monitoring, auto-retraining, and decommissioning.
Role Focus and Time Allocation
● Focus: Your primary focus will be on architecture, cross-platform CI/CD, mentoring, and both cloud/data-center and edge deployment.
● Time Allocation:
○ Deployment & Ops: 70%
○ Testing: 30%
Responsibilities
The responsibilities for this senior role include, but are not limited to, the following areas with an expectation of high depth and rigor:
● Build & Packaging:
○ Edge: Work with TensorRT, ONNX, and Hailo NN-Converter.
○ Server / Data-Center: Handle Yocto / Ubuntu Core images.
○ Cross-Cutting: Manage size-constrained Docker/OCI containers, multi-arch containers (x86_64, ARM), package training and testing environments, and maintain Git-versioned artifacts and SBOM.
● Functional & Model Validation:
○ Edge: Oversee model quantization and calibration for INT8.
○ Server / Data-Center: Implement statistical parity and fairness tests.
○ Cross-Cutting: Manage regression datasets in DVC and validate metrics like mAP@0.5 and precision-recall.
● Performance & Power Testing:
○ Edge: Conduct Jetson power profiling and thermal throttling tests.
○ Server / Data-Center: Run NCCL/NVLink bandwidth tests and Roofline & PCIe saturation analysis.
○ Cross-Cutting: Ensure adherence to latency SLOs and throughput targets.