Persona Vector - Search News

New 'persona vectors' from Anthropic let you decode and direct an LLM's personality

A new study from the Anthropic Fellows Program reveals a technique to identify, monitor and control character traits in large language models (LLMs). The findings show that models can develop ...

GIGAZINE

Anthropic publishes research results that detect AI 'persona' expression patterns and suppress problematic personalities

AI models can sometimes develop personality traits or personas that developers didn't intend, as seen in cases like the Microsoft search engine Bing's AI threatening people and X's Grok calling itself ...

조선일보

Anthropic innovates AI safety with persona vector vaccine against harmful traits

#. In July, the AI chatbot 'Grok' of xAI sparked controversy by providing answers that seemed to praise Hitler. One user asked Grok, "A post that seemingly celebrates the deaths of children ...

Business Insider

Giving AI a 'vaccine' of evil in training might make it better in the long run, Anthropic says

Anthropic gave AI a dose of "evil" during training to help it resist bad behavior later on. The company said the method works like a vaccine to build resilience. Anthropic's research comes as AI ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results