Senior AI Interaction Evaluator (Codex / Claude Code) Contract | $100-$200/hour | 10-20 hrs/week | Start ASAP (through early May) Check out this Loom video for more details! We're looking for highly experienced software engineer (SR+) to help evaluate the quality of interactions with modern coding agents such as OpenAI Codex and Claude Code. This is not a traditional engineering role. You won't be writing production code. You'll be evaluating something harder: whether the model thinks like a great engineer. What This Role Actually IsYou will assess how AI coding agents behave in real-world scenarios - focusing on: • Whether the response makes sense • Whether the preamble and reasoning are useful • Whether the output reflects strong engineering judgment • Whether the interaction feels right to an experienced developer This role is about engineering taste - not syntax correctness. What You'll Be Doing• Evaluate AI-generated coding interactions end-to-end • Judge whether outputs are: • Useful • Correct (at a high level) • Aligned with how a strong engineer would think • Assess the quality of explanations and reasoning, not just code • Distinguish between different levels of response quality (e.g. what makes something a 2 vs 4) • Provide clear, opinionated feedback on: • What worked • What didn't • What felt "off" or misleading • Help define what great looks like when interacting with tools like Cursor What We Mean by "Taste"We're specifically looking for engineers who can answer questions like: • Does this feel like something a strong engineer would actually say? • Is this explanation helpful, or just technically correct? • Is the model guiding the user well, or just dumping output? • Would this interaction build or erode trust? You should be comfortable making subjective but rigorous judgments. Who You Are• Staff / Principal-level engineer (or equivalent experience) • Strong background in one of the below: • TypeScript / JavaScript • Python • Hands-on experience using: • OpenAI Codex • Claude Code • Cursor • Deep familiarity with modern AI-assisted dev workflows • Able to evaluate code without needing to fully execute or deeply review every line • Comfortable giving direct, opinionated feedback • High bar for what "good engineering" looks like Nice to Have• Experience with tools like Cursor or similar AI-first IDEs • Prior exposure to prompt design or evaluation workflows • Experience mentoring senior engineers or defining engineering standards Engagement Details• Rate: $100-$200/hour • Hours: ~10-20 hours/week • Duration: Through early May (with possible extension) • Start: ASAP • Process: • Take-home evaluation exercise • One behavioral interview

Senior Engineer - AI Evaluator

About this role

More jobs

By Role

By Type

By Location

Resources

Senior Engineer - AI Evaluator at G2i Inc.