Member of Technical Staff - ML Training Systems at Modal

Back to jobs
Modal

Member of Technical Staff - ML Training Systems

11d ago
Location
new york, new york, US
Type
On-site · Full-time
Compensation
$150k – 350k/yr
Skills
PyTorchHugging Face / High Level Training FrameworksMl Training OptimizationDistributed Training / Overlapping Communications With ComputeData Loading OptimizationPerformance Engineering (high Performance Code)Gpu / CudaLinux (kernel, File Systems)+1
ABOUT US: Modal provides the infrastructure foundation for AI teams. With instant GPU access, sub-second container startups, and native storage, Modal makes it simple to train models, run batch jobs, and serve low-latency inference. Companies like Suno, Lovable, and Substack rely on Modal to move from prototype to production without the burden of managing infrastructure. We're a fast-growing team based out of NYC, SF, and Stockholm. We've hit 9-figure ARR and recently raised a Series B https://modal.com/blog/announcing-our-series-b at a $1.1B valuation. We have thousands of customers who rely on us for production AI workloads, including Lovable, Scale AI, Substack, and Suno. Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn https://github.com/mwaskom/seaborn, Luigi https://github.com/spotify/luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience. THE ROLE: We are looking for strong engineers with experience training production machine learning models. If you are interested in contributing to open-source projects and evolving Modal's infrastructure to train the next generation of language models, we'd love to hear from you! REQUIREMENTS: - 5+ years of experience writing high-quality, high-performance code. - Experience working with torch and high-level training frameworks (Huggingface, verl, slime) - Experience with ML training optimization (tell us a story about eliminating data loading bottlenecks, overlapping communications with compute, rewriting a trainer to handle off-policy rollouts, etc.) - Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc). - Ability to work in-person, in our NYC or San Francisco office. Compensation: $150K - $350K