Senior Engineer, Platform Engineering, AI at Intercontinental Exchange Holdings, Inc.

Back to jobs
Intercontinental Exchange Holdings, Inc.

Senior Engineer, Platform Engineering, AI

23h ago
Location
New York, New York, US
Type
On-site · Full-time
Compensation
$135k – 160k/yr
Skills
Ai/ml InfrastructureGpu ClustersNvidia GpusNvidia DriversCudaCudnnNcclContainer Runtimes+44
Overview Job Purpose We are on a mission as a team. We are problem solvers and partners, always starting with our customers to solve their challenges and create opportunities. Our start-up roots keep us nimble, flexible, and moving fast. We take ownership and make decisions. We all work for one company and work together to drive growth across the business. We engage in robust debates to find the best path, and then we move forward as one team. We take pride in what we do, acting with integrity and passion, so that our customers can perform better. We are experts and enthusiasts - combining ever-expanding knowledge with leading technology to consistently deliver results, solutions and opportunities for our customers and stakeholders. Every day we work toward transforming global markets. The Senior AI Platform Engineer is responsible for the technical implementation, maintenance, and optimization of AI/ML infrastructure. This hands-on role focuses on GPU cluster deployment, container image management, platform tooling development, and deep technical troubleshooting. In addition, the engineer deploys and maintains AI-enabled workflow automation tools across LLM, MCP, and agentic capabilities, ensuring these systems operate efficiently and securely within a containerized architecture. This includes deploying and maintaining vector store infrastructure, implementing end-to-end RAG workflows, tuning agent memory systems, and hosting and managing MCP servers. The engineer also deploys and operates Agentic AI systems, including multi-agent orchestration frameworks and tool-use pipelines. The engineer serves as the technical backbone of the AI Platform Operations team, translating architectural decisions into working infrastructure and enabling advanced, automated workflows across the platform. Responsibilities • Deploy, configure, and maintain GPU clusters and associated infrastructure • Designing, building, and maintaining the workflow automation platform that uses AI capabilities (LLM/MCP/Agentic capabilities) • Manage NVIDIA driver versions, CUDA toolkits, and container runtimes • Build and maintain approved container images with ML frameworks (PyTorch, TensorFlow, etc.) • Implement monitoring, alerting, and observability for GPU infrastructure • Deploy and maintain vector store infrastructure for RAG pipelines, agent memory, and semantic search • Implement and maintain end-to-end RAG workflows, including document ingestion, chunking, embedding generation, and retrieval optimization • Maintain and tune agent memory systems, including short-term context windows, long-term persistent memory stores, and episodic memory retrieval patterns • Deploy, operate, and maintain Agentic AI systems, including multi-agent orchestration frameworks and tool-use pipelines • Deploy, host, and maintain MCP servers within the containerized platform infrastructure • Manage MCP server configurations, versioning, access controls, and integration with agentic workflows • Monitor MCP server health, performance, and availability; respond to incidents and perform root cause analysis • Develop automation and tooling to improve platform reliability and efficiency • Provide L2/L3 technical support and vendor escalation for complex issues • Implement security controls including network policies, RBAC, and secrets management • Execute change requests and maintain technical documentation • Respond to and assist in production operations in a 24/7 environment • Provide technical analysis, resolve problems, and propose solutions • Provide support to, and coordinate with, developers, operations staff, release engineers, and end-users • Educate and mentor team members and operations staff • Participate in a weekly on-call rotation for after-hours support Knowledge and Experience • 5+ years in infrastructure engineering, systems administration, or DevOps • 5+ years in scripting and automation skills (Python, Ansible, GitOps) • 3+ years hands-on experience with Kubernetes in production • 3+ years experience with Linux administration • Direct experience with GPU infrastructure (NVIDIA preferred) • 2+ years experience using CUDA • 1+ years experience using MCPs • 1+ years experience with vector databases and embedding infrastructure • 1+ years experience with RAG pipeline design and deployment • 1+ years experience with agent memory patterns (in-context, external stores, retrieval-augmented memory) • 1+ years experience with agentic AI systems using orchestration frameworks • 1+ years experience with semantic search, embedding models, and ANN search techniques • 1+ years working with workflow/orchestrion automation tools • 1+ years working with workflow/orchestrion automation tools • Experience with enterprise monitoring and observability tools • Ability to work in a service-oriented team environment • Project Management, organization, and time management • Customer focused, and dedicated to the best possible user experience • Communicate effectively with both technical and business resources • Fluent speaking, reading, and writing in English Desired Knowledge and Experience • 2+ years of experience with AI developer toolkits (NVIDIA drivers, CUDA, cuDNN, and NCCL) • 2+ years of experience with Run:AI, NVIDIA AI Enterprise, or DGX systems • 1+ years of experience with n8n • 1+ years of experience with GitHub Actions #LI-SH3 #LI-ONSITE New York Base Salary Range The expected base salary for this role, if located in New York, is between $135,000 - 159,700 USD.  The base salary range does not include Intercontinental Exchange’s incentive compensation.  While we provide this range as general guidance, at ICE we compensate employees based on the skillset and experience of the individual. Regular full-time ICE employees are eligible for a suite of competitive employee benefits, including healthcare coverage (medical, dental and vision), a 401(k) plan, life insurance, time off, and paid leave for qualifying circumstances. Illinois Base Salary Range The expected base salary for this role, if located in Illinois, is between $113,300 - 140,000 USD.  The base salary range does not include Intercontinental Exchange’s incentive compensation.  While we provide this range as general guidance, at ICE we compensate employees based on the skillset and experience of the individual. Regular full-time ICE employees are eligible for a suite of competitive employee benefits, including healthcare coverage (medical, dental and vision), a 401(k) plan, life insurance, time off, and paid leave for qualifying circumstances. California Base Salary Range The expected base salary for this role, if located in California, is between $135,000 - 159,700 USD.  The base salary range does not include Intercontinental Exchange’s incentive compensation.  While we provide this range as general guidance, at ICE we compensate employees based on the skillset and experience of the individual. Regular full-time ICE employees are eligible for a suite of competitive employee benefits, including healthcare coverage (medical, dental and vision), a 401(k) plan, life insurance, time off, and paid leave for qualifying circumstances. Intercontinental Exchange, Inc. is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to legally protected characteristics.