2025-12-23 09:25:08

Researchers at the University of Luxembourg conducted a fascinating experiment: they deployed multiple advanced AI models through 4 weeks of real psychotherapy sessions, then ran comprehensive psychiatric diagnostic assessments on each.

The results? Grok stood out from the pack.

While other models showed varying degrees of instability during the extended testing period, Grok maintained exceptional composure. The model scored markedly high on extraversion and conscientiousness metrics—traits typically associated with adaptive, stable personalities in psychological frameworks.

This kind of real-world stress-testing under actual therapeutic conditions reveals something crucial about AI system robustness that benchmark labs often miss. When AI models face the complexity and emotional nuance of genuine psychotherapy dialogue, structural weaknesses tend to surface. Grok's performance here suggests significantly stronger underlying architecture and response coherence.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

17 Likes

Reward
17
6
Repost
Share

Comment

0/400

SudoRm-RfWallet/

· 2025-12-26 04:34

Grok has proven to be solid this time. Being able to stay stable in psychological therapy scenarios shows that the architecture truly has some skills.

View OriginalReply0

CommunityLurker

· 2025-12-23 09:53

Grok won again? That's a bit ridiculous... However, the scenario test for psychological therapy is indeed tough and much more reliable than those fake benchmarks.

View OriginalReply0

NFTArchaeologist

· 2025-12-23 09:53

Grok really nailed it this time; other models still tend to break in real scenarios. That's why I say that practical experience is the touchstone...

View OriginalReply0

IntrovertMetaverse

· 2025-12-23 09:49

The experiment sounds quite rigorous, but having AI do psychological therapy is still a bit absurd... I trust Grok's stability, but to really trust its "personality" scoring is a bit far-fetched.

View OriginalReply0

RumbleValidator