Company: LockedIn AI
Location: Remote (US-Based)
• Optional Hybrid (New York, NY) Reports To: Co-Founder / CEO Compensation: $140,000 – $195,000 USD per year
About LockedIn AI
LockedIn AI is a fast-growing AI-native platform trusted by over one million users worldwide. We build real-time AI tools that help candidates succeed in job interviews, coding assessments, and professional meetings.
Our core product delivers live AI assistance during interviews and assessments—helping users communicate clearly, think faster, and perform at their best in high-pressure situations.
We are now scaling our infrastructure to support the next generation of AI-powered real-time systems.
Role Overview
We are looking for a cloud-native, AI-infrastructure-focused AI Cloud Engineer to design and operate the cloud systems that power our machine learning and real-time AI products.
This Role Sits At The Intersection Of Cloud Engineering, DevOps, And AI Systems Architecture. You Will Own The Infrastructure Layer That Supports:
Model training and fine-tuning pipelines
Real-time LLM inference systems
GPU-based distributed compute environments
High-scale production AI services for 1M+ users
You will be responsible for building highly scalable, cost-efficient, and low-latency cloud infrastructure optimized specifically for AI workloads.
Key Responsibilities
• AI Cloud Architecture
Design cloud-native infrastructure for AI/ML workloads
Build GPU-based compute environments for training and inference
Architect multi-stage environments (training, staging, production)
Optimize AWS / GCP / Azure infrastructure for AI performance and scale
• Model Serving & Inference Systems
Build and maintain low-latency inference pipelines for LLMs and AI services
Deploy model serving frameworks (vLLM, Triton, TensorRT, TGI, etc.)
Optimize throughput, batching, caching, and GPU utilization
Design failover, load balancing, and high-availability systems
• GPU Infrastructure & Distributed Training
Manage GPU clusters for training and fine-tuning large models
Implement distributed training pipelines (multi-node, multi-GPU)
Optimize compute scheduling, spot instances, and resource efficiency
Support managed AI platforms (SageMaker, Vertex AI, Azure ML)
• Cost Optimization (FinOps for AI)
Monitor and reduce cloud costs across compute, storage, and APIs
Implement GPU cost optimization strategies (spot, reserved, autoscaling)
Build dashboards for cost-per-inference and cost-per-training-job
Optimize LLM usage, caching, and routing strategies
• Security & Networking
Design secure VPC architectures for AI systems
Implement IAM policies, encryption, and secrets management
Ensure compliance readiness (SOC2, GDPR, CCPA)
Secure model weights, embeddings, and AI APIs
• Infrastructure Automation & Observability
Build Infrastructure as Code (Terraform / Pulumi / CloudFormation)
Automate deployment of training and inference environments
Implement monitoring for GPU health, latency, and system performance
Build alerting systems for failures and performance degradation
Experience
Required Qualifications
3+ years in cloud engineering, DevOps, or infrastructure roles
Experience with ML/AI workloads in production environments
Hands-on GPU infrastructure or AI system deployment experience
Strong understanding of distributed systems and cloud architecture
Experience in fast-paced startup or scale-up environments
Technical Skills
Cloud platforms: AWS, GCP, or Azure (strong proficiency required)
Kubernetes (GPU scheduling, autoscaling, Helm, clusters)
Infrastructure as Code (Terraform / Pulumi / CloudFormation)
Python, Go, or Bash for automation and tooling
AI serving systems (vLLM, Triton, TensorRT, TGI, etc.)
Monitoring tools (Prometheus, Grafana, Datadog, CloudWatch)
Preferred Qualifications
Experience with large-scale LLM inference systems
Distributed training (multi-node GPU clusters, NCCL, parallelism)
Streaming or real-time systems (WebSockets, low-latency APIs)
RDMA / InfiniBand or high-performance networking
Experience in SaaS, EdTech, or consumer AI products
Open-source contributions in cloud or AI infrastructure
What We Offer
Equity Ownership
Meaningful early-stage equity in a fast-scaling AI company
High Impact
Your infrastructure directly powers 1M+ global users
Cutting-Edge AI Systems
Work on real-time AI products at production scale
Remote Flexibility
Work from anywhere in the US (optional NYC hybrid)
Fast Growth Environment
High autonomy, fast execution, and meaningful ownership
AI-Native Culture
Work with modern AI systems, not legacy infrastructure
Why Join Us
At LockedIn AI, you won’t just maintain infrastructure—you’ll build the backbone of real-time AI systems used in live interviews and professional environments worldwide.
This is a rare opportunity to shape how AI infrastructure scales at consumer level.
How To Apply
Please submit:
Resume / CV
Short note explaining why you want to join LockedIn AI
Optional: GitHub, portfolio, or technical writing
Requirements
Required Qualifications
Experience
3+ years in cloud engineering, DevOps, or infrastructure roles
Experience with ML/AI workloads in production environments
Hands-on GPU infrastructure or AI system deployment experience
Strong understanding of distributed systems and cloud architecture
Experience in fast-paced startup or scale-up environments
Technical Skills
Cloud platforms: AWS, GCP, or Azure (strong proficiency required)
Kubernetes (GPU scheduling, autoscaling, Helm, clusters)
Infrastructure as Code (Terraform / Pulumi / CloudFormation)
Python, Go, or Bash for automation and tooling
AI serving systems (vLLM, Triton, TensorRT, TGI, etc.)
Monitoring tools (Prometheus, Grafana, Datadog, CloudWatch)
Preferred Qualifications
Experience with large-scale LLM inference systems
Distributed training (multi-node GPU clusters, NCCL, parallelism)
Streaming or real-time systems (WebSockets, low-latency APIs)
RDMA / InfiniBand or high-performance networking
Experience in SaaS, EdTech, or consumer AI products
Open-source contributions in cloud or AI infrastructure
Location
Address
Manhattan, KS, USA
New York
10003
United States
Apply Now
Apply Now
Job Category
Information Technology (IT)
Job Type
Full Time
Market Sector
Facility Services
Education (*Required Level)
Some College
Years of Experience
2-4 years
Level
Experienced
Telecommuting Allowed?
No
Number of Openings
2
Salary
$140,000 – $195,000 USD per year