Generative AI chatbot ‘promising’ for mental health treatment, but supervision needed

0
Generative AI chatbot ‘promising’ for mental health treatment, but supervision needed

April 10, 2025

3 min read

Key takeaways:

  • Therabot users experienced reduction in symptoms for major depressive disorder, generalized anxiety disorder and feeding and eating disorders.
  • Participants rated the Therabot app similarly to a human therapist.

Adults showed a significant reduction in their clinical-level mental health symptoms with unrestricted access to a generative AI-powered chatbot designed for mental health treatment, according to results of a randomized trial.

Further, the chatbot was well-utilized and the participants rated their alliance with it as comparable to that of human therapists. The findings were published in NEJM AI, a journal from The New England Journal of Medicine.



Psych0325Heinz_Graphic_01_WEB

Data were derived from Heinz MV, et al. NEJM AI. 2025;doi:10.1056/AIoa2400802.

“While these results are very promising, no generative AI agent is ready to operate fully autonomously in mental health where there is a very wide range of high-risk scenarios it might encounter,” Michael V. Heinz, MD, postdoctoral fellow at the AI and Mental Health Lab at the Dartmouth College Center for Technology and Behavioral Health, assistant professor of psychiatry in the Geisel School of Medicine at Darthmouth College and attending psychiatrist at the Dartmouth-Hitchcock Medical Center and Hanover Psychiatry, said in a related press release.

“We still need to better understand and quantify the risks associated with generative AI used in mental health contexts,” he added.

Digital therapeutics offer a solution to the inadequacies of the current mental health infrastructure, including limited scalability and accessibility, according to the researchers. Building upon this, generative AI (Gen-AI) chatbots may improve patient engagement with digital therapeutics, offering greater personalization and potential therapeutic alliance than automated technologies, but research has been limited.

This inspired Heinz and colleagues to conduct the first-ever national randomized controlled trial of Therabot, a Gen-AI chatbot of their own creation. Dartmouth researchers began developing Therabot, a text-based multithread chat application for iOS and Android, in 2019 and trained it with professionally written therapist-patient dialogues based on third-wave cognitive behavioral therapy.

They evaluated Therabot’s ability to treat symptoms of major depressive disorder, generalized anxiety disorder (GAD) or high risk for feeding and eating disorders (CHR-FED) among 210 adults aged 18 years or older (mean age, 33.86 years; standard deviation [SD], 10.97 years; 59.5% women; 53.3% non-Hispanic white) recruited via a Meta Ads campaign. Based on self-reported questionnaire responses, researchers stratified the participants into the MDD (n = 142), GAD (n = 116) or CHR-FED (n = 89) group and then randomly assigned to a 4-week Therabot intervention (n = 106) or waitlist control (n = 104).

Participants in the intervention group were prompted to engage with Therabot during the first 4 weeks, and then they could engage as desired for the 4 following weeks. Participants in the waitlist control group did not have access to Therabot until after the study concluded.

The study’s primary outcomes included symptom changes from baseline to postintervention (4 weeks) and follow-up (8 weeks), measured using the Patient Health Questionnaire 9 (PHQ-9), the Generalized Anxiety Disorder Questionnaire (GAD-Q-IV) or the Weight Concerns Scale (WCS). Secondary outcomes included user engagement, acceptability and therapeutic alliance, which describes the level of trust and collaboration between patient and caregiver.

Results showed that Therabot users experienced greater reductions in their respective symptoms compared with the control group.

Specifically, compared with the waitlist control group, Therabot users in the MDD group had greater mean changes in PHQ-9 score at postintervention (–6.13 vs. 2.63) and follow-up (7.93 vs. 4.22). A similar trend was observed for the GAD-Q-IV score in the GAD group (postintervention: 2.32 vs. 0.13; follow-up: 3.18 vs. 1.11), and for the WCS score in the CHR-FED group (postintervention: 9.83 vs. 1.66; follow-up: 10.23 vs. 3.7).

Further, the intervention group reported a therapeutic alliance with Therabot comparable to what patients report with in-person providers.

Also, 96 patients in the treatment group rated their experienced with Therabot on a seven-point Likert scale, with seven being the highest. Overall satisfaction was 5.3 (SD, 1.89) and users generally rated Therabot as easy to learn to use (6.42; SD, 1.18) and intuitive (5.58; SD, 1.58). Most people found the Therabot sessions helpful (5.44; SD, 1.82) and said they would use it on their own (5.12; SD, 2.02). Significantly, they rated Therabot as similar to a real therapist (4.9; SD, 2.21).

Finally, the researchers found that 95% of participants assigned to the intervention group interacted with Therabot. Participants used Therabot for an average of 6.18 hours over the total study. Staff intervention was required on 28 occasions for safety concerns (n = 15), such as suicidal ideation, or to correct inappropriate responses from Therabot (n = 13).

The researchers noted several limitations to this study, including potential selection bias toward younger people more open to AI.

“Therabot [was] available around the clock for challenges that arose in daily life and could walk users through strategies to handle them in real time,” Heinz said.

The development of Gen-AI systems for mental health such as Therabot requires rigorous benchmarks for safety, efficacy and tone of engagement and should include the involvement and supervision by mental health experts, he said.

“The feature that allows AI to be so effective is also what confers its risk,” he said. “Patients can say anything to it, and it can say anything back.”

Reference:

link

Leave a Reply

Your email address will not be published. Required fields are marked *