Field AI is transforming how robots interact with the real world by building reliable AI systems for complex challenges in robotics. They are seeking an ML Infrastructure Engineer to develop the software platform and tooling that supports fast AI development and deployment across their ML and robotics stacks.
Responsibilities
• Build ML Infrastructure & Developer Tooling
• Design and implement internal tools, libraries, and CLI utilities that streamline experimentation, model training, and evaluation
• Improve local and cloud development environments using Docker, internal package registries, and monorepos
• Build reusable templates and interfaces for training, evaluation, and inference pipelines
• Support the ML Lifecycle (Data → Models → Deployment)
• Develop pipelines for dataset ingestion, transformation, versioning, and validation
• Automate model training, evaluation, packaging, and deployment to cloud and edge environments
• Ensure integrity and traceability across data, code, and model artifacts
• Improve Build Systems and Developer Experience
• Maintain and evolve a shared monorepo across ML, robotics, and software teams
• Leverage Bazel or similar systems to enable fast, reproducible builds and tests
• Enhance developer workflows to support consistent environments and reduce friction
• Own CI/CD and Automation for ML Systems
• Build and maintain CI/CD pipelines (e.g., GitHub Actions, AWS Step Functions) for ML experimentation and deployment
• Automate regression testing and benchmarking models
• Develop observability tools: dashboards, telemetry systems, and model health monitoring
• Collaborate Across Engineering & Research Teams
• Work closely with ML scientists, software engineers, and roboticists to translate high-level platform needs into robust engineering solutions
• Participate in code and design reviews, documentation, and cross-team planning
Skills
• 3+ years of industry experience in software engineering, infrastructure, MLOps, or DevOps roles
• Deep familiarity with the ML lifecycle, including data preparation, model training, packaging, and deployment
• Strong software engineering foundations: proficiency with Git, Python, and system design
• Experience building and managing containerized environments (e.g., Docker) and working with orchestration tools (e.g., Kubernetes)
• Hands-on experience with CI/CD workflows and infrastructure-as-code (e.g., Terraform, AWS CDK)
• Experience with cloud ML platforms (AWS, GCP, or Azure)
• A strong product mindset — building internal tools with empathy for researchers and engineers
• Experience with distributed training frameworks (e.g., PyTorch DDP, FSDP, DeepSpeed, Megatron)
• Familiarity with orchestrating large-scale training jobs using Kubernetes-based platforms (e.g., Ray, SageMaker, EKS, Karpenter)
• Background in hybrid edge-cloud ML deployments or infrastructure supporting robotic systems
• Prior work in environments requiring real-time ML performance, safety validation, or regulatory traceability
Company Overview
• FieldAI is pioneering the development of a field-proven, hardware agnostic brain technology that enables many different types of robots to operate autonomously in hazardous, offroad, and potentially harsh industrial settings – all without GPS, maps, or any pre-programmed routes. It was founded in 2023, and is headquartered in Mission Viejo, California, USA, with a workforce of 11-50 employees. Its website is https://www.fieldai.com.
Company H1B Sponsorship
• FieldAI has a track record of offering H1B sponsorships, with 9 in 2025. Please note that this does not guarantee sponsorship for this specific role.