Llama 2 Basis Fashions Accessible in Databricks Lakehouse AI

October 10, 2023

1

We’re excited to announce that Meta AI’s Llama 2 basis chat fashions can be found within the Databricks Market so that you can fine-tune and deploy on personal mannequin serving endpoints. The Databricks Market is an open market that allows you to share and alternate information property, together with datasets and notebooks, throughout clouds, areas, and platforms. Including to the info property already supplied on Market, this new itemizing supplies on the spot entry to Llama 2’s chat-oriented massive language fashions (LLM), from 7 to 70 billion parameters, in addition to centralized governance and lineage monitoring within the Unity Catalog. Every mannequin is wrapped in MLflow to make it simple so that you can use the MLflow Analysis API in Databricks notebooks in addition to to deploy with a single-click on our LLM-optimized GPU mannequin serving endpoints.

Databricks Marketplace — Databricks Market

What’s Llama 2?

Llama 2 is Meta AI’s household of generative textual content fashions which are optimized for chat use circumstances. The fashions have outperformed different open fashions and signify a breakthrough the place fine-tuned open fashions can compete with OpenAI’s GPT-3.5-turbo.

Llama 2 on Lakehouse AI

Now you can get a safe, end-to-end expertise with Llama 2 on the Databricks Lakehouse AI platform:

Entry on Databricks Market. You possibly can preview the pocket book, and get on the spot entry to the chat-family of Llama 2 fashions from the Databricks Market. The Market makes it simple to find and consider state-of-the-art basis fashions you’ll be able to handle within the Unity Catalog.
Centralized Governance within the Unity Catalog. As a result of the fashions at the moment are in a catalog, you routinely get all of the centralized governance, auditing, and lineage monitoring that comes with the Unity Catalog to your Llama 2 fashions.
Deploy in One-Click on with Optimized GPU Mannequin Serving. We packaged the Llama 2 fashions in MLflow so you’ll be able to have single-click deployment to privately host your mannequin with Databricks Mannequin Serving. The now Public Preview of GPU Mannequin Serving is optimized to work with massive language fashions to supply low latencies and help excessive throughput. This can be a nice possibility to be used circumstances that leverage delicate information or in circumstances the place you can’t ship buyer information to third-parties.
Saturate Personal Endpoints with AI Gateway. Privately internet hosting fashions on GPUs will be costly, which is why we advocate leveraging the MLflow AI Gateway to create and distribute routes for every use case in your group to saturate the endpoint. The AI Gateway now helps fee limiting for price management along with safe credential administration of Databricks Mannequin Serving endpoints and externally-hosted SaaS LLMs.