Anthropic explains how its Constitutional AI girds Claude against adversarial inputs

It is not hard — at all — to trick today’s chatbots into discussing taboo topics, regurgitating bigoted content and spreading misinformation. That’s why AI pioneer Anthropic has imbued its generative AI, Claude, with a mix of 10 secret principles of fairness, which it unveiled in March. In a blog post Tuesday, the company further explained how its Constitutional AI system is designed and how it is intended to operate.

Normally, when an generative AI model is being trained, there’s a human in the loop to provide quality control and feedback on the outputs — like when ChatGPT or Bard asks you rate your conversations with their systems. “For us, this involved having human contractors compare two responses,” the Anthropic team wrote. “from a model and select the one they felt was better according to some principle (for example, choosing the one that was more helpful, or more harmless).”

Problem with this method is that a human also has to be in the loop for the really horrific and disturbing outputs. Nobody needs to see that, even fewer need to be paid $1.50 an hour by Meta to see that. The human advisor method also sucks at scaling, there simply aren’t enough time and resources to do it with people. Which is why Anthropic is doing it with another AI.

Just as Pinocchio had Jiminy Cricket, Luke had Yoda and Jim had Shart, Claude has its Constitution. “At a high level, the constitution guides the model to take on the normative behavior described [therein],” the Anthropic team explained, whether that’s “helping to avoid toxic or discriminatory outputs, avoiding helping a human engage in illegal or unethical activities, and broadly creating an AI system that is ‘helpful, honest, and harmless.’”

According to Anthropic, this training method can produce Pareto improvements in the AI’s subsequent performance compared to one trained only on human feedback. Essentially, the human in the loop has been replaced by an AI and now everything is reportedly better than ever. “In our tests, our CAI-model responded more appropriately to adversarial inputs while still producing helpful answers and not being evasive,” Anthropic wrote. “The model received no human data on harmlessness, meaning all results on harmlessness came purely from AI supervision.”

The company revealed on Tuesday that its previously undisclosed principles are synthesized from “a range of sources including the UN Declaration of Human Rights, trust and safety best practices, principles proposed by other AI research labs, an effort to capture non-western perspectives, and principles that we discovered work well via our research.”

The company, pointedly getting ahead of the invariable conservative backlash, has emphasized that “our current constitution is neither finalized nor is it likely the best it can be.”

“There have been critiques from many people that AI models are being trained to reflect a specific viewpoint or political ideology, usually one the critic disagrees with,” the team wrote. “From our perspective, our long-term goal isn’t trying to get our systems to represent a specific ideology, but rather to be able to follow a given set of principles.”

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission. All prices are correct at the time of publishing.

Source link: https://www.engadget.com/anthropic-explains-how-its-constitutional-ai-girds-claude-against-adversarial-inputs-160008153.html?src=rss

Anthropic explains how its Constitutional AI girds Claude against adversarial inputs

Sponsors

Latest

Ground Zero of Russia’s War in Ukraine

Get Your Financial House In Order Before Launching Your Retirement Side Hustle

Rashford continues ridiculous goalscoring form, Weghorst gets first Red Devils goal as Ten Hag’s men take control

Report: Celtics meeting with free agents TJ Warren and Lamar Stevens

Chelsea agree £58m fee for Romeo Lavia as Liverpool snubbed and Man City set for windfall

Crypto & investing

Looking Back At Nine Years Of Bull Run

Shiba Inu Whales Bet Big As Shibarium Grows, Road To $0.1?

Bitcoin Miner Reserve Rising: Good News For BTC Bulls?

Technology

Microsoft launches new features for its AI-powered Bing and Edge

Team Liquid opens 13-story Alienware Training Facility in Brazil

Twitter users are uploading the entire ‘Super Mario Bros. Movie’

Politics

‘The Culture Transplant’ Accidentally Makes the Case for More Immigration

Joy Reid Works Overtime to Stoke Flames with Race-Based Attacks on DeSantis, Mocking Jesus

Biden aides wave off excitement over China’s Middle East deal

Sports

Vanderbilt men’s team fell with Cowboys but wins with Bruins!

‘I wouldn’t change it for what happened’

Jayson Tatum hits game-winner in Celtics-Sixers, but only because Joel Embiid’s heave was milliseconds late