Posted by Jaclyn Konzelmann and Megan Li – Google Labs
Seize an API key in Google AI Studio, and get began with the Gemini API Cookbook
Lower than two months in the past, we made our next-generation Gemini 1.5 Professional mannequin accessible in Google AI Studio for builders to check out. We’ve been amazed by what the group has been in a position to debug, create and be taught utilizing our groundbreaking 1 million context window.
Immediately, we’re making Gemini 1.5 Professional accessible in 180+ international locations through the Gemini API in public preview, with a first-ever native audio (speech) understanding functionality and a brand new File API to make it straightforward to deal with information. We’re additionally launching new options like system directions and JSON mode to offer builders extra management over the mannequin’s output. Lastly, we’re releasing our subsequent technology textual content embedding mannequin that outperforms comparable fashions. Go to Google AI Studio to create or entry your API key, and begin constructing.
Unlock new use circumstances with audio and video modalities
We’re increasing the enter modalities for Gemini 1.5 Professional to incorporate audio (speech) understanding in each the Gemini API and Google AI Studio. Moreover, Gemini 1.5 Professional is now in a position to purpose throughout each picture (frames) and audio (speech) for movies uploaded in Google AI Studio, and we stay up for including API help for this quickly.
You’ll be able to add a recording of a lecture, like this 117,000+ token lecture from Jeff Dean, and Gemini 1.5 Professional can flip it right into a quiz with a solution key. [Video sped up for demo purposes] |
Gemini API Enhancements
Immediately, we’re addressing a lot of high developer requests:
1. System directions: Information the mannequin’s responses with system directions, now accessible in Google AI Studio and the Gemini API. Outline roles, codecs, objectives, and guidelines to steer the mannequin’s conduct on your particular use case.
2. JSON mode: Instruct the mannequin to solely output JSON objects. This mode allows structured knowledge extraction from textual content or photographs. You will get began with cURL, and Python SDK help is coming quickly.
3. Enhancements to perform calling: Now you can choose modes to restrict the mannequin’s outputs, enhancing reliability. Select textual content, perform name, or simply the perform itself.
A brand new embedding mannequin with improved efficiency
Beginning in the present day, builders will be capable to entry our subsequent technology textual content embedding mannequin through the Gemini API. The brand new mannequin, text-embedding-004, (text-embedding-preview-0409 in Vertex AI), achieves a stronger retrieval efficiency and outperforms current fashions with comparable dimensions, on the MTEB benchmarks.
‘Textual content-embedding-004’ (aka Gecko) utilizing 256 dims output outperforms all bigger 768 dim output fashions on MTEB benchmarks |
These are simply the primary of many enhancements coming to the Gemini API and Google AI Studio within the subsequent few weeks. We’re persevering with to work on making Google AI Studio and the Gemini API the simplest approach to construct with Gemini. Get began in the present day in Google AI Studio with Gemini 1.5 Professional, discover code examples and quickstarts in our new Gemini API Cookbook, and be a part of our group channel on Discord.