Columnist and Technology

Why curbing chatbots' worst exploits is a game of whack-a-mole

AI companies are trying to impose safety measures on their chatbots, while researchers are finding ways around them all the time. Where will this end, asks Alex Wilkins

By Alex Wilkins

24 April 2024

2R6BX6F Illustration of symbolic representations of good and evil AI morality. — Robert Hyrons/Alamy Stock Photo

It has become common for artificial intelligence companies to claim that the worst things their chatbots can be used for can be mitigated by adding “safety guardrails”. These can range from seemingly simple solutions, like warning the chatbots to look out for certain requests, to more complex software fixes – but none is foolproof. And almost on a weekly basis, researchers find new ways to get around these measures, called jailbreaks.

You might be wondering why this is an issue – what’s the worst that could happen? One bleak scenario might be an AI being used to fabricate a lethal bioweapon,…

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox! We'll also keep you up to date with New Scientist events and special offers.

View introductory offers

No commitment, cancel anytime*

Offer ends 2nd of July 2024.

*Cancel anytime within 14 days of payment to receive a refund on unserved issues.

Inclusive of applicable taxes (VAT)

Existing subscribers

Columnist and Technology

Why curbing chatbots' worst exploits is a game of whack-a-mole

Sign up to our weekly newsletter

More from New Scientist

Technology

AI noise-cancelling headphones let you focus on just one voice

Technology

Surgeons can use AI chatbot to tell robots to help with suturing

Technology

OpenAI overtakes Google in race to build the future, but who wants it?

Technology

DeepMind AI can predict how drugs interact with proteins

Popular articles

1

2

3

4

5

6

7

8

9

10