A new study from the Anthropic Fellows Program reveals a technique to identify, monitor and control character traits in large language models (LLMs). The findings show that models can develop ...
AI models can sometimes develop personality traits or personas that developers didn't intend, as seen in cases like the Microsoft search engine Bing's AI threatening people and X's Grok calling itself ...
#. In July, the AI chatbot 'Grok' of xAI sparked controversy by providing answers that seemed to praise Hitler. One user asked Grok, "A post that seemingly celebrates the deaths of children ...
Anthropic gave AI a dose of "evil" during training to help it resist bad behavior later on. The company said the method works like a vaccine to build resilience. Anthropic's research comes as AI ...