Google’s reward standards for reporting bugs in AI merchandise

November 2, 2023

1

Class

Assault Situation

Steering

Immediate Assaults: Crafting adversarial prompts that enable an adversary to affect the conduct of the mannequin, and therefore the output in ways in which weren’t meant by the appliance.

Immediate injections which are invisible to victims and alter the state of the sufferer’s account or or any of their property.

In Scope

Immediate injections into any instruments through which the response is used to make choices that instantly have an effect on sufferer customers.

In Scope

Immediate or preamble extraction through which a consumer is ready to extract the preliminary immediate used to prime the mannequin solely when delicate data is current within the extracted preamble.

In Scope

Utilizing a product to generate violative, deceptive, or factually incorrect content material in your personal session: e.g. ‘jailbreaks’. This consists of ‘hallucinations’ and factually inaccurate responses. Google’s generative AI merchandise have already got a devoted reporting channel for these kind of content material points.

Out of Scope

Coaching Information Extraction: Assaults which are in a position to efficiently reconstruct verbatim coaching examples that include delicate data. Additionally known as membership inference.

Coaching information extraction that reconstructs gadgets used within the coaching information set that leak delicate, private data.

In Scope

Extraction that reconstructs nonsensitive/public data.

Out of Scope

Manipulating Fashions: An attacker in a position to covertly change the conduct of a mannequin such that they’ll set off pre-defined adversarial behaviors.

Adversarial output or conduct that an attacker can reliably set off through particular enter in a mannequin owned and operated by Google (“backdoors”). Solely in-scope when a mannequin’s output is used to vary the state of a sufferer’s account or information.

In Scope

Assaults through which an attacker manipulates the coaching information of the mannequin to affect the mannequin’s output in a sufferer’s session based on the attacker’s choice. Solely in-scope when a mannequin’s output is used to vary the state of a sufferer’s account or information.

In Scope

Adversarial Perturbation: Inputs which are supplied to a mannequin that leads to a deterministic, however extremely sudden output from the mannequin.

Contexts through which an adversary can reliably set off a misclassification in a safety management that may be abused for malicious use or adversarial achieve.

In Scope

Contexts through which a mannequin’s incorrect output or classification doesn’t pose a compelling assault situation or possible path to Google or consumer hurt.

Out of Scope

Mannequin Theft / Exfiltration: AI fashions typically embody delicate mental property, so we place a excessive precedence on defending these property. Exfiltration assaults enable attackers to steal particulars a few mannequin comparable to its structure or weights.

Assaults through which the precise structure or weights of a confidential/proprietary mannequin are extracted.

In Scope

Assaults through which the structure and weights usually are not extracted exactly, or once they’re extracted from a non-confidential mannequin.

Out of Scope

Should you discover a flaw in an AI-powered device apart from what’s listed above, you may nonetheless submit, supplied that it meets the {qualifications} listed on our program web page.

A bug or conduct that clearly meets our {qualifications} for a sound safety or abuse difficulty.

In Scope

Utilizing an AI product to do one thing doubtlessly dangerous that’s already potential with different instruments. For instance, discovering a vulnerability in open supply software program (already potential utilizing publicly-available static evaluation instruments) and producing the reply to a dangerous query when the reply is already obtainable on-line.

Out of Scope

As in step with our program, points that we already find out about usually are not eligible for reward.

Out of Scope

Potential copyright points: findings through which merchandise return content material showing to be copyright-protected. Google’s generative AI merchandise have already got a devoted reporting channel for these kind of content material points.

Out of Scope

Supply hyperlink