Subscribe now

Columnist and Technology

Why curbing chatbots' worst exploits is a game of whack-a-mole

AI companies are trying to impose safety measures on their chatbots, while researchers are finding ways around them all the time. Where will this end, asks Alex Wilkins

By Alex Wilkins

24 April 2024

2R6BX6F Illustration of symbolic representations of good and evil AI morality.

Robert Hyrons/Alamy Stock Photo

It has become common for artificial intelligence companies to claim that the worst things their chatbots can be used for can be mitigated by adding “safety guardrails”. These can range from seemingly simple solutions, like warning the chatbots to look out for certain requests, to more complex software fixes – but none is foolproof. And almost on a weekly basis, researchers find new ways to get around these measures, called jailbreaks.

You might be wondering why this is an issue – what’s the worst that could happen? One bleak scenario might be an AI being used to fabricate a lethal bioweapon,…

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox! We'll also keep you up to date with New Scientist events and special offers.

Sign up

To continue reading, subscribe today with our introductory offers

View introductory offers

No commitment, cancel anytime*

Offer ends 2nd of July 2024.

*Cancel anytime within 14 days of payment to receive a refund on unserved issues.

Inclusive of applicable taxes (VAT)

or

Existing subscribers

Sign in to your account