We are seeking a highly skilled Network Consulting Engineer (NCE) to design and implement next-generation VXLAN EVPN-based data center networks that support hypercomputing and AI infrastructure. This role is central to enabling the high performance, low-latency environments required for multi-GPU clusters, distributed training pipelines, and scalable AI workloads.
The ideal candidate will bring deep expertise in VXLAN, BGP, EVPN, RoCEv2, and fabric-based networking, along with a solid understanding of the networking demands of large-scale AI and HPC environments.
Must have
- Proven experience designing and operating VXLAN EVPN-based data center networks.
- Strong knowledge of Cisco ACI, Nexus 9000, or Arista EOS platforms.
- Expertise in data center routing and switching, including BGP, OSPF, IS-IS, multicast, and fabric path.
- Deep understanding of AI-specific networking needs, including RoCEv2, DCB, PFC, and RDMA optimization.
- Familiarity with GPU-accelerated workloads and distributed systems in AI/ML and HPC environments.
- Minimum Experience: 10+ years in IT, with at least 8+ years experience in VXLAN EVPNbased data center networks that support hypercomputing and AI infrastructure.
- CCNP or CCIE Data Center (highly preferred).
Nice to haves
- Certified Specialist – Enterprise Core or Data Center Core.
- Additional certifications in NVIDIA Networking (Cumulus/Spectrum) or Arista ACE are a plus.
- Experience with hybrid and multi-cloud networking for AI clusters is desirable Is a Plus:
- Strong analytical and problem-solving skills, with the ability to make data-driven decisions.
- Exceptional communication and interpersonal skills, with a proven track record of working with diverse teams.
- Up-to-date knowledge of industry standards, security protocols, and compliance regulations in the banking sector.
- Demonstrated ability to adapt to emerging trends and technologies.
- Design, deploy, and manage VXLAN EVPN-based fabric networks supporting AI and high-performance computing workloads.
- Build scalable spine-leaf architectures using Cisco Nexus 9000 or Arista switches, ensuring efficient Layer 2/Layer 3 segmentation and workload mobility.
- Optimize network performance for RoCEv2, GPUDirect, and low-latency traffic flows, ensuring maximum throughput and minimal jitter for GPU-accelerated clusters.
- Configure underlay and overlay protocols including BGP, OSPF, IS-IS, and troubleshoot connectivity and control-plane issues.
- Integrate data center fabric with virtualization platforms such as VMware NSX, KVM, and Hyper-V, as well as Kubernetes-based AI infrastructure.
- Collaborate with compute, storage, and DevOps teams to ensure end-to-end performance across AI training, fine-tuning, inference, and ETL pipelines.
- Drive network automation efforts using tools such as Python, Ansible, and Terraform, enabling Infrastructure-as-Code (IaC) for repeatable, scalable deployments.
- Participate in performance tuning, benchmarking, and capacity planning for evolving AI workloads and future-proof infrastructure designs.
Technology Optimization:
- Evaluate and enhance existing systems, identifying opportunities for innovation and efficiency.
- Utilize monitoring tools like Dynatrace to ensure system reliability and proactively address potential issues.
- Team Leadership and Mentorship:
- Mentor and guide junior architects and developers to foster a culture of excellence.
- Lead cross-functional technical discussions and provide direction to teams to achieve project goals.
- CCNP or CCIE Data Center (highly preferred)
- Cisco Certified Specialist – Enterprise Core or Data Center Core
- Additional certifications in NVIDIA Networking (Cumulus/Spectrum) or Arista ACE are a plus
- Experience with hybrid and multi-cloud networking for AI clusters is desirable
Educational Qualifications
- Bachelors in Computer Science/Applications/BTech Computer Science/MCA