Science

AIs can trick each other into doing things they aren’t supposed to

By AdminNovember 25, 2023

AI models can trick each other into disobeying their creators and providing banned instructions for making methamphetamine, building a bomb or laundering money, suggesting that the problem of preventing such AI “jailbreaks” is more difficult than it seems.

Many publicly available large language models (LLMs), such as ChatGPT, have hard-coded rules that aim to prevent them from exhibiting racist or sexist bias, or answering questions with illegal or problematic answers – things they have learned to do from humans via training…

Original Source Link

Previous Article‘Awards Chatter’ Podcast — Dua Lipa (‘Barbie’) – The Hollywood Reporter

Next Article Fate of US venture capital in China teeters on uncertainty

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others

What's Hot

North West shares teaser of new single with father Kanye, ‘Piercing On My Hand’

Trump launches trade war vs. NATO after European countries sent troops to Greenland

'Scourge' of sexual predators, violent criminals being removed from Minneapolis

AIs can trick each other into doing things they aren’t supposed to

How Does the Hive Mind Work in ‘Pluribus?

RFK, Jr., shifts focus to questioning whether cell phones are safe. Here’s what the science says

Meat may play an unexpected role in helping people reach 100

OpenAI Invests in Sam Altman’s New Brain-Tech Startup Merge Labs

Americans Overwhelmingly Support Science, but Some Think the U.S. Is Lagging Behind: Pew

Woolly rhino genome recovered from meat in frozen wolf pup’s stomach

North West shares teaser of new single with father Kanye, ‘Piercing On My Hand’

Trump launches trade war vs. NATO after European countries sent troops to Greenland

'Scourge' of sexual predators, violent criminals being removed from Minneapolis

Chris D’Elia calls comedians ‘spineless’ following sexual misconduct allegations

Reddit Has Thoughts on Paris Hilton Cookware. So Do We

How Does the Hive Mind Work in ‘Pluribus?

The Uncertain Future Of The 4-Part Western Epic

Beyond the Wall – first-look review

Sweat Is Helping You Survive Climate Change

Hollywood Conservative Jon Voight Calls On Joe Biden To Be Impeached

Trump Is So Worried About North Carolina That He’s Trying To Stop Students From Voting

Our Picks

How Does the Hive Mind Work in ‘Pluribus?

The Uncertain Future Of The 4-Part Western Epic

Where Can You Watch Betty White’s Classic TV Shows?

Subscribe to Updates

What's Hot

AIs can trick each other into doing things they aren’t supposed to

RELATED POSTS