gigRemote OKRemote$50–$150/hour depending on domain expertise (specialist rates higher)
AI Evaluations Specialist — Contract
Surge AI
Surge AI hires domain experts as contract evaluators for frontier-lab
training pipelines — your job is to design, write, or judge outputs
against rubrics in your specialty (law, medicine, finance, academic
research, software engineering, creative writing, math, etc.). The
labs use the resulting data for RLHF, evals, and red-team curricula.
How the engagement works: paid hourly, project-based, flexible hours
(typically 10–30hrs/week while a project runs), remote, projects last
weeks to months. Specialist rates apply if you have credentialed
expertise (e.g. licensed attorney, MD, working software engineer at
a senior level) — Surge has a known-good rate card you'll see during
onboarding.
Honest fit signals:
— Deep domain expertise that the labs can't crowd-source. Generalist
applicants get routed to lower-rate tasks and may not be matched at
all in tight-domain projects.
— You can write clearly and follow detailed rubrics. The work IS
judgment calls, but they're judgment calls within tight definitions.
— You're comfortable with NDA-bound contract work (you won't be allowed
to talk publicly about specific projects or labs you work with).
What's not a fit: anyone looking for full-time work (Surge is genuinely
contract-only), anyone without specialist credentials hoping to make
top rates, or anyone uncomfortable with their judgments being used to
train commercial AI systems.