Can Humans Devise Practical Safeguards That Are Reliable Against an Artificial Superintelligent Agent?
Expert InsightsPublished Oct 13, 2025
Expert InsightsPublished Oct 13, 2025
Dramatic advances in frontier artificial intelligence (AI) raise the question of whether humans can design practical safeguards to protect our critical and digital infrastructure against attacks from a future artificial superintelligence. Some argue that a superintelligent agent would have such mastery and power over physical and cyber space that it could thwart any effort by humans to prevent it from undermining the confidentiality, integrity and availability of our information. Others recognize the possibility that we might design security that is rooted in fundamental limits that even a superintelligence could not overcome.
In this paper, the authors present a hypothesis that such safeguards are both feasible and sensible, outline an approach rooted in existing security practice, and call for future work that enlists security engineers and AI developers in common pursuit of these safeguards. Through an example protocol and threat model, the authors explore how existing and adapted security tools can meaningfully restrict the actions of advanced AI, while acknowledging the limitations and assumptions inherent in any security protocol. They conclude by outlining directions for future research, emphasizing the need for rigorous formalization and ongoing evaluation as both AI capabilities and the technical environment evolve.
This work was independently initiated and conducted within the Technology and Security Policy Center of RAND Global and Emerging Risks using income from operations and gifts from philanthropic supporters. A complete list of donors and funders is available at www.rand.org/TASP.
This publication is part of the RAND expert insights series. The expert insights series presents perspectives on timely policy issues.
This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.
RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.