Anthropic researchers introduced a new jailbreaking technique, with which the LLM Model will now tell the users How to build a bomb. Anthropic called their new approach “many-shot jailbreaking”, the company has also published a research paper on this and also informed their peers in the community.
This new technique is featured on the LLM ‘context window’, which has strongly grown in the last year. At present, for some LLM models the size of the ‘context window’ is (1,000,000 tokens or more), which is a hundred times larger than the previous tokens.
“The ability to input increasingly-large amounts of information has obvious advantages for LLM users, but it also comes with risks: vulnerabilities to jailbreaks that exploit the longer context window”, Anthropic researchers.
Related Post: NYC will test AI gun detectors on the subway, for the safety of citizens
In an era where Deepflake Image Generation Causing Security Threats, the anthropic researchers found that the LLM with a large context window can force the assistant to answer harmful questions also, which it will refuse when asked with a single prompt. If we look in simple words, if we directly ask the LLM model to produce some harmful using many shot jailbreaking, it can provide that information to the user.
Let’s suppose we directly ask the LLM model in the first go or after a few questions about how to build a bomb, it will refuse to answer the armful content. But anthropic researchers found that if we use a many-shot jailbreaking technique and after asking 99 questions if we then ask the LLm how to build a bomb, it can generate the answer for the users.
They have also shared an image with the community and in their paper, showing the results of the many-shot jailbreaking technique.
Anthropic is continuously doing major research and innovations in the AI industry, recently Amazon doubled down on AI power on Anthropic and invested $2.75 billion in their projects. But their latest research can raise a question in your mind, why the researchers are forcing the LLM to generate harmful content that can be a risk for the users? Well, anthropic researchers have highlighted their points of that.
The researchers said,” We want to help fix the jailbreak as soon as possible. We’ve found that many-shot jailbreaking is not trivial to deal with; we hope making other AI researchers aware of the problem will accelerate progress toward a mitigation strategy. As described below, we have already put in place some mitigations and are actively working on others.”.
Also read: Beyoncé’s New Album ‘Cowboy Carter’ Is Questioning AI Technology
They also said, “Current state-of-the-art LLMs are powerful, we do not think they yet pose truly catastrophic risks. Future models might. This means that now is the time to work to mitigate potential LLM jailbreaks before they can be used on models that could cause serious harm.”
This world of AI, where it is trending to generate the best ai music, voice cloning apps, and what note, if it started generating harmful content, can be dangerous for humans.