Data and AI-Enabled Biological Design

Risks Related to Biological Training Data and Opportunities for Governance

Allison Berke, Forrest W. Crawford, Toby Webster, James Smith, Sana Zakaria, Sella Nevo

Expert InsightsPublished Jun 30, 2025

Artificial intelligence models trained on large volumes of biological data (AI-bio models) have demonstrated the growing abilities to support of basic scientific research goals. But some AI-bio models may be dual use, providing both beneficial capabilities and potentially dangerous ones. A nefarious actor with access to a frontier AI-bio model might be able to use it to design a pathogen with harmful phenotypic characteristics that enhance transmissibility. But model capabilities are closely linked to the data used to train them, and much less attention has been devoted to the relationship between dangerous capabilities and biological training data. The data that are included (or excluded) in model training heavily influences the models' capabilities and limitations. Governance of data used to train AI-bio models could be a useful way to allow beneficial scientific research while safeguarding against potentially dangerous capabilities.

The authors of this paper assess current knowledge about the link between biological data and AI-bio model capabilities, describe the anticipated impacts of new biological data sources, and outline potentially dangerous capabilities that could come from broad availability of certain types of biological data. They then recommend strategies to limit the potentially dangerous capabilities arising from biological data, including options for governance of experiments and data creation, governance of curation and aggregations of data, controls on access to collections of data, and governance of the use of data for model training.

Topics

Document Details

Citation

Chicago Manual of Style

Berke, Allison, Forrest W. Crawford, Toby Webster, James Smith, Sana Zakaria, and Sella Nevo, Data and AI-Enabled Biological Design: Risks Related to Biological Training Data and Opportunities for Governance. Santa Monica, CA: RAND Corporation, 2025. https://www.rand.org/pubs/perspectives/PEA3886-1.html.
BibTeX RIS

Research conducted by

This publication is part of the RAND expert insights series. The expert insights series presents perspectives on timely policy issues.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.