Toward Best Practices for AI Evaluation and Governance

A Proposal for a European Union General-Purpose AI Model Evaluation Standards Task Force

Patricia Paskov, Lisa Soder, Everett Smith

Expert InsightsPublished Jun 24, 2025

A promising way of identifying and mitigating the systemic risks posed by artificial intelligence (AI) development and deployment is general-purpose AI (GPAI) evaluations. Although GPAI evaluations play an increasingly central role in institutional decisionmaking and policymaking, including the European Union (EU) AI Act’s mandate to conduct evaluations on GPAI models presenting systemic risk, no standards exist to promote GPAI evaluations’ quality. To strengthen GPAI evaluations in the EU, the first and only jurisdiction that mandates GPAI evaluations, the authors outline four desiderata for evaluations: internal validity, external validity, reproducibility, and portability. To uphold these desiderata in a dynamic environment of continuously evolving technologies, the authors propose a dedicated EU GPAI Evaluation Standards Task Force to be housed in the bodies established by the EU AI Act. The authors outline the task force’s responsibilities, discuss its potential influence on global AI governance, and address potential sources of failure that policymakers should heed. This paper is primarily intended for EU policymakers and emerging EU AI governance bodies, particularly those involved in implementing the EU AI Act’s provisions on GPAI evaluations. Secondary audiences include AI safety and security institutes, GPAI providers, civil society organizations, and international standards bodies that might interact with EU frameworks.

Topics

Document Details

Citation

Chicago Manual of Style

Paskov, Patricia, Lisa Soder, and Everett Smith, Toward Best Practices for AI Evaluation and Governance: A Proposal for a European Union General-Purpose AI Model Evaluation Standards Task Force. Santa Monica, CA: RAND Corporation, 2025. https://www.rand.org/pubs/perspectives/PEA3624-1.html.
BibTeX RIS

Research conducted by

This publication is part of the RAND expert insights series. The expert insights series presents perspectives on timely policy issues.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.