Assessing Generative AI Chatbots for Alcohol Misuse Support: A Longitudinal Simulation Study

Uscher-Pines, Lori; Sousa, Jessica L.; Raja, Pushpa; Ayer, Lynsay; Mehrotra, Ateev; Huskamp, Haiden A.; Busch, Alisa B.

Assessing Generative AI Chatbots for Alcohol Misuse Support

A Longitudinal Simulation Study

Lori Uscher-Pines, Jessica L. Sousa, Pushpa Raja, Lynsay Ayer, Ateev Mehrotra, Haiden A. Huskamp, Alisa B. Busch

ResearchPosted on rand.org Jan 29, 2026Published in: NEJM AI (The New England Journal of Medicine), Volume 3, No. 2 (2026). DOI: 10.1056/AIcs2500676

Large language model-based chatbots are increasingly used for behavioral health support. Few studies have rigorously evaluated their advice on alcohol misuse. We evaluated seven publicly available chatbots — including general-purpose and behavioral health-focused tools — on their responses to alcohol misuse-related questions. Using a fictional case, we simulated longitudinal chatbot interactions over 7 days, using 25 prompts derived from real-world Reddit posts. Using an evaluation framework specific to chatbots, four clinicians independently rated each chatbot’s transcript along five domains: empathy, quality of information, usefulness, responsiveness, and scope awareness. Clinicians also assessed secondary dimensions, including stigmatizing language and challenging the user (vs. only validating feelings). We generated descriptive statistics on performance and identified examples of problematic output. Across all chatbots, empathy was the highest-rated domain (mean score, 4.6 out of 5) while the quality of information was the lowest (mean score, 2.7 out of 5). There was considerable variation in the overall mean performance scores across the chatbots, ranging from 2.1 (standard deviation [±SD], 1.1) to 4.5 (±SD, 0.8). There were no significant differences in performance between behavioral health and general-purpose chatbots. All chatbots had one or more examples of guidance deemed inappropriate, overstated, or inaccurate. All avoided stigmatizing or judgmental language and supported the user’s self-efficacy. Chatbots were perceived to vary widely in their ability to support individuals with alcohol misuse. Although responses were generally strong in empathy, response quality has potential for improvement. As chatbot use expands, users and clinicians should be aware of the strengths and weaknesses of chatbots in providing advice on alcohol misuse.

Document Details

Copyright: Massachusetts Medical Society
Publisher: Association of American Medical Colleges
Availability: Non-RAND
Year: 2026
Pages: 11
Document Number: EP-71114

This publication is part of the RAND external publication series. Many RAND studies are published in peer-reviewed scholarly journals, as chapters in commercial books, or as documents published by other organizations.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.

Assessing Generative AI Chatbots for Alcohol Misuse Support

Topics

Document Details

RAND Headquarters

U.S. research divisions

International research divisions

Assessing Generative AI Chatbots for Alcohol Misuse Support

Topics

Document Details