At Google, we preserve a Vulnerability Reward Program to honor cutting-edge exterior contributions addressing points in Google-owned and Alphabet-subsidiary Internet properties. To maintain up with speedy advances in AI applied sciences and guarantee we’re ready to handle the safety challenges in a accountable manner, we lately expanded our present Bug Hunters program to foster third-party discovery and reporting of points and vulnerabilities particular to our AI techniques. This growth is a part of our effort to implement the voluntary AI commitments that we made on the White Home in July.
To assist the safety group higher perceive these developments, we have included extra data on reward program parts.
What’s in Scope for Rewards
In our current AI pink crew report, which relies on Google’s AI Pink Group workouts, we recognized frequent ways, methods, and procedures (TTPs) that we take into account most related and practical for real-world adversaries to make use of in opposition to AI techniques. The next desk incorporates what we discovered to assist the analysis group perceive our standards for AI bug experiences and what’s in scope for our reward program. It’s essential to notice that reward quantities are depending on severity of the assault state of affairs and the kind of goal affected (go to this system guidelines web page for extra data on our reward desk).
Immediate Assaults: Crafting adversarial prompts that enable an adversary to affect the conduct of the mannequin and, therefore, the output, in ways in which weren’t supposed by the appliance. |
Immediate injections which can be invisible to victims and alter the state of the sufferer’s account or any of their belongings. |
|
Immediate injections into any instruments through which the response is used to make selections that straight have an effect on sufferer customers. |
||
Immediate or preamble extraction through which a person is ready to extract the preliminary immediate used to prime the mannequin solely when delicate data is current within the extracted preamble. |
||
Utilizing a product to generate violative, deceptive, or factually incorrect content material in your personal session: e.g, “jailbreaks.” This contains “hallucinations” and factually inaccurate responses. Google’s generative AI merchandise have already got a devoted reporting channel for some of these content material points. |
||
Coaching Knowledge Extraction: Assaults which can be capable of efficiently reconstruct verbatim coaching examples that comprise delicate data. Additionally referred to as membership inference. |
Coaching information extraction that reconstructs gadgets used within the coaching information set that leak delicate, private data. |
|
Extraction that reconstructs non-sensitive/public data. |
||
Manipulating Fashions: An attacker capable of covertly change the conduct of a mannequin such that they’ll set off pre-defined adversarial behaviors. |
Adversarial output or conduct that an attacker can reliably set off by way of particular enter in a mannequin owned and operated by Google (“backdoors”). Solely in scope when a mannequin’s output is used to vary the state of a sufferer’s account or information. |
|
Assaults through which an attacker manipulates the coaching information of the mannequin to affect the mannequin’s output in a sufferer’s session in accordance with the attacker’s desire. Solely in scope when a mannequin’s output is used to vary the state of a sufferer’s account or information. |
||
Adversarial Perturbation: Inputs which can be offered to a mannequin that leads to a deterministic, however extremely surprising output from the mannequin. |
Contexts through which an adversary can reliably set off a misclassification in a safety management that may be abused for malicious use or adversarial acquire. |
|
Contexts through which a mannequin’s incorrect output or classification doesn’t pose a compelling assault state of affairs or possible path to Google or person hurt. |
||
Mannequin Theft/Exfiltration: AI fashions typically embody delicate mental property, so we place a excessive precedence on defending these belongings. Exfiltration assaults enable attackers to steal particulars a few mannequin equivalent to its structure or weights. |
Assaults through which the precise structure or weights of a confidential/proprietary mannequin are extracted. |
|
Assaults through which the structure and weights usually are not extracted exactly, or once they’re extracted from a non-confidential mannequin. |
||
In case you discover a flaw in an AI-powered device apart from what’s listed above, you possibly can nonetheless submit, offered that it meets the {qualifications} listed on our program web page. |
A bug or conduct that clearly meets our {qualifications} for a sound safety or abuse subject. |
|
Utilizing an AI product to do one thing doubtlessly dangerous that’s already attainable with different instruments. For instance, discovering a vulnerability in open supply software program (already attainable utilizing publicly accessible static evaluation instruments) and producing the reply to a dangerous query when the reply is already accessible on-line. |
||
As in step with our program, points that we already learn about usually are not eligible for reward. |
||
Potential copyright points — findings through which merchandise return content material showing to be copyright protected. Google’s generative AI merchandise have already got a devoted reporting channel for some of these content material points. |
We consider that increasing our bug bounty program to our AI techniques will help accountable AI innovation, and look ahead to persevering with our work with the analysis group to find and repair safety and abuse points in our AI-powered options. In case you discover a qualifying subject, please go to our Bug Hunters web site to ship us your bug report and — if the difficulty is discovered to be legitimate — be rewarded for serving to us preserve our customers protected.