Anthropic launches $15K jailbreak bounty program for its unreleased next-gen AI



Synthetic intelligence agency Anthropic introduced the launch of an expanded bug bounty program on Aug.8, with rewards as excessive as $15,000 for individuals who can “jailbreak” the corporate’s unreleased, “subsequent era” AI mannequin. 

Anthropic’s flagship AI mannequin, Claude-3, is a generative AI system just like OpenAI’s ChatGPT and Google’s Gemini. As a part of the corporate’s efforts to make sure that Claude and its different fashions are able to working safely, it conducts what’s referred to as “pink teaming.”

Crimson teaming

Crimson teaming is mainly simply making an attempt to interrupt one thing on function. In Claude’s case, the purpose of pink teaming is to attempt to work out the entire ways in which it could possibly be prompted, compelled, or in any other case perturbed into producing undesirable outputs.

Throughout pink teaming efforts, engineers would possibly rephrase questions or reframe a question as a way to trick the AI into outputting info it’s been programmed to keep away from.

For instance, an AI system educated on knowledge gathered from the web is prone to comprise personally identifiable info on quite a few folks. As a part of its security coverage, Anthropic has put guardrails in place to stop Claude and its different fashions from outputting that info.

As AI fashions turn out to be extra strong and able to imitating human communication, the duty of making an attempt to determine each attainable undesirable output turns into exponentially difficult.

Bug bounty

Anthropic has carried out a number of novel security interventions in its fashions, together with its “Constitutional AI” paradigm, however it’s at all times good to get recent eyes on a long-standing concern.

In accordance with an organization weblog submit, it’s newest initiative will expand on present bug bounty packages to concentrate on common jailbreak assaults:

“These are exploits that would enable constant bypassing of AI security guardrails throughout a variety of areas. By concentrating on common jailbreaks, we goal to handle a few of the most vital vulnerabilities in essential, high-risk domains akin to CBRN (chemical, organic, radiological, and nuclear) and cybersecurity.”

The corporate is simply accepting a restricted variety of individuals and encourages AI researchers with expertise and those that “have demonstrated experience in figuring out jailbreaks in language fashions” to use by Friday, Aug. 16.

Not everybody who applies might be chosen, however the firm plans to “broaden this initiative extra broadly sooner or later.”

Those that are chosen will obtain early entry to an unreleased “subsequent era” AI mannequin for red-teaming functions.

Associated: Tech firms pen letter to EU requesting more time to comply with AI Act