• *Overview**
Remote, full-time engineering roles for mid-senior STEM graduates in the United States working on real AI/ML systems. You will help build and improve LLM training pipelines using RLHF, data labeling programs, and model evaluation/QA evaluation to drive measurable model performance improvement.
• *What You Will Do**
• Design and ship ML components for LLM training pipelines
• Partner with data operations to define annotation guidelines and labeling instructions
• Build and iterate RLHF workflows (ranking, preference data, critique signals)
• Run prompt evaluation and model evaluation to diagnose failure modes
• Implement QA evaluation checks for annotation guidelines compliance
• Coordinate NLP tasks (e.g., named entity recognition, classification) and CV annotation (bounding boxes, segmentation)
• Contribute to content safety labeling policies and sampling strategies
• Track improvements via offline metrics, error analysis, and dataset/versioning practices
• *Required Qualifications**
• STEM degree (or equivalent experience)
• Strong Python and software engineering fundamentals
• Hands-on ML experience in NLP and/or Computer Vision
• Familiarity with RLHF, LLM evaluation, and/or prompt evaluation concepts
• Ability to write clear specs for labeling and QA evaluation processes
• *Work Setup**
Remote (US). Collaborate asynchronously with distributed teams and may interface with AI labs, startups, and annotation vendors.
• *Compensation**
Base pay range: $30–$50 per hour.