Simpler Is Better for Autograders: Toward Cost-Effective LLM Evaluations for Open-Ended Tasks
A simple rubric-based autograder outperformed more-complex large language model grading methods across benchmarks, often matching or beating nonexpert human graders while cutting down evaluation time and cost.