Wednesday, December 27, 2023
HomeBig DataImplement Hugging Face Fashions utilizing Langchain?- Analytics Vidhya

Implement Hugging Face Fashions utilizing Langchain?- Analytics Vidhya


Introduction

Giant Language Fashions have been the spine of development within the AI area. With the discharge of varied Open supply LLMs, the necessity for ChatBot-specific use circumstances has grown in demand. HuggingFace is the first supplier of Open Supply LLMs, the place the mannequin parameters can be found to the general public, and anybody can use them for inference. Then again, Langchain is a strong, giant language mannequin framework that helps combine AI seamlessly into your software with the assistance of a language mannequin. By combining Langchain and HuggingFace, one can simply incorporate domain-specific ChatBots.

Studying Goals

  • Perceive the necessity for open-source giant language fashions and the way HuggingFace is without doubt one of the most vital suppliers.
  • Discover three strategies to implement Giant Language Fashions with the assistance of the Langchain framework and HuggingFace open-source fashions.
  • Learn to implement the HuggingFace job pipeline with Langchain utilizing T4 GPU free of charge.
  • Learn to implement fashions from HuggingFace Hub utilizing Inference API on the CPU with out downloading the mannequin parameters.
  • Implementation of LlamaCPP utilizing gguf format Giant language fashions format.

This text was printed as part of the Knowledge Science Blogathon.

HuggingFace and Open Supply Giant Language fashions

HuggingFace is the cornerstone for growing AI and deep studying fashions. The in depth assortment of open-source fashions within the Transformers repository by HuggingFace makes it a go-to selection for a lot of practitioners. Publicly accessible studying parameters characterize open-source giant language fashions, comparable to LLaMA, Falcon, Mistral, and so forth. In distinction, closed-source giant language fashions have personal studying parameters. Using such fashions might necessitate interacting with API endpoints, as seen with GPT-4 and GPT -3.5, for example.

That is the place HuggingFace turns out to be useful. HuggingFace offered HuggingFace Hub, a platform with over 120k fashions, 20k datasets, and 50k areas (demo AI functions).

What’s Langchain?

With the development of Giant Language Fashions in AI, the necessity for informative ChatBots is in excessive demand. Let’s say you based a brand new Gaming firm with many consumer manuals and shortcut documentation. You might want to combine a ChatBot like ChatGPT for this firm’s information. How will we obtain this?

That is the place Langchain is available in. Langchain is a strong Giant Language mannequin framework that integrates varied parts comparable to embedding, Vector Databases, LLMs, and so forth. Utilizing these parts, we are able to present exterior paperwork to the numerous language fashions and construct AI functions seamlessly.

Set up

We have to set up the required libraries to get began with alternative ways to make use of HuggingFace on Langchain.

To make use of Langchain parts, we are able to straight set up Langchain with the next command:

!pip set up langchain

To make use of HuggingFace Fashions and embeddings, we have to set up transformers and sentence transformers. Within the newest replace of Google Colab, you don’t want to put in transformers.

!pip set up transformers
!pip set up sentence-transformers
!pip set up bitsandbytes speed up

To run the GenAI functions on edge, Georgi Gerganov developed LLamaCPP. LLamaCPP implements the Meta’s LLaMa structure in environment friendly C/C++.

!pip set up llama-cpp-python

Method 1: HuggingFace Pipeline

The pipelines are a terrific and simple means to make use of fashions for inference. HuggingFace offers a pipeline wrapper class that may simply combine duties like textual content era and summarization in only one line of code. This code line incorporates the calling pipeline attribute by instantiating the mannequin, tokenizer, and job title.

We should load the Giant Langauge mannequin and related tokenizer to implement this. Since not everybody can entry A100 or V100 GPUs, we should proceed with the Free T4 GPU. To run the big language mannequin for inference utilizing pipeline, we’ll use orca-mini 3 billion parameter LLM with quantization configuration to scale back the mannequin measurement.

from langchain.llms.huggingface_pipeline import HuggingFacePipeline
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, 
from transformers import BitsAndBytesConfig

nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

Within the offered code snippet, we make the most of AutoModelForCausalLM to load the mannequin and AutoTokenizer to load the tokenizer. As soon as the mannequin and tokenizer are loaded, assign the mannequin and tokenizer to the pipeline and point out the duty to be textual content era. The pipeline additionally permits adjustment of the output sequence size by modifying max_new_tokens.

model_id = "pankajmathur/orca_mini_3b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
mannequin = AutoModelForCausalLM.from_pretrained(
                 model_id,
                 quantization_config=nf4_config
                 )
pipe = pipeline("text-generation", 
               mannequin=mannequin, 
               tokenizer=tokenizer, 
               max_new_tokens=512
               )

Good job on operating the pipeline efficiently. HuggingFacePipeline wrapper class helps to combine the Transformers mannequin and Langchain. The code snippet beneath defines the immediate template for the orca mannequin.

hf = HuggingFacePipeline(pipeline=pipe)

question = "Who's Shah Rukh Khan?"

immediate = f"""
### System:
You're an AI assistant that follows instruction extraordinarily properly. 
Assist as a lot as you possibly can. Please be truthful and provides direct solutions

### Person:
{question}

### Response:
"""

response = hf.predict(immediate)
print(response)
HuggingFace and Langchain

Method 2: HuggingFace Hub utilizing Inference API

In method one, you may need observed that whereas utilizing the pipeline, the mannequin and tokenization obtain and cargo the weights. This method may be time-consuming if the size of the mannequin is big. Thus, the HuggingFace Hub Inference API turns out to be useful. To combine HuggingFace Hub with Langchain, one requires a HuggingFace Entry Token.

Steps to get HuggingFace Entry Token

  • Log in to HuggingFace.co.
  • Click on in your profile icon on the top-right nook, then select “Settings.”
  • Within the left sidebar, navigate to “Entry Token.”
  • Generate a brand new entry token, assigning it the “write” function.
from langchain.llms import HuggingFaceHub
import os
from getpass import getpass

os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass("HF Token:")

When you get your Entry token, use HuggingFaceHub to combine the Transformers mannequin with Langchain. On this case, we use the Zephyr, a fined-tuned mannequin on Mistral 7B.

llm = HuggingFaceHub(
    repo_id="huggingfaceh4/zephyr-7b-alpha", 
    model_kwargs={"temperature": 0.5, "max_length": 64,"max_new_tokens":512}
)

question = "What's capital of India and UAE?"

immediate = f"""
 <|system|>
You're an AI assistant that follows instruction extraordinarily properly.
Please be truthful and provides direct solutions
</s>
 <|consumer|>
 {question}
 </s>
 <|assistant|>
"""

response = llm.predict(immediate)
print(response)
HuggingFace and Langchain

Since we’re utilizing Free Inference API, there are a number of limitations on utilizing the bigger language fashions with 13B, 34B, and 70B fashions.

Method 3: LlamaCPP

LLamaCPP permits the usage of fashions packaged as. gguf information format that runs effectively in CPU-only and blended CPU/GPU environments utilizing the llama.

To make use of LlamaCPP, we particularly want fashions whose model_path ends with gguf. You can obtain the mannequin from right here: zephyr-7b-beta.This fall.gguf. As soon as this mannequin is downloaded, you possibly can straight add it to your drive or every other native storage.

from langchain.llms import LlamaCpp

from google.colab import drive
drive.mount('/content material/drive')

llm_cpp = LlamaCpp(
            streaming = True,
            model_path="/content material/drive/MyDrive/LLM_Model/zephyr-7b-beta.Q4_K_M.gguf",
            n_gpu_layers=2,
            n_batch=512,
            temperature=0.75,
            top_p=1,
            verbose=True,
            n_ctx=4096
            )
"

The immediate template stays the identical since we’re utilizing the Zephyr mannequin.

question = "Who's Elon Musk?"

immediate = f"""
 <|system|>
You're an AI assistant that follows instruction extraordinarily properly.
Please be truthful and provides direct solutions
</s>
 <|consumer|>
 {question}
 </s>
 <|assistant|>
"""

response = llm_cpp.predict(immediate)
print(response)
HuggingFace and Langchain

Conclusion

To conclude, we efficiently applied HuggingFace open-source fashions with Langchain. Utilizing these approaches, one can simply keep away from paying OpenAI API credit. This information primarily centered on utilizing the Open Supply LLMs, one main RAG pipeline element.

Key Takeaways

  • Utilizing HuggingFace’s Transformers pipeline, one can simply decide any top-performing Giant Language fashions, Llama2 70B, Falcon 180 B, or Mistral 7B. The inference script is lower than 5 strains of code.
  • As not all can afford to make use of A100 or V100 GPUs, HuggingFace offers Free Inference API (Entry Token) to implement a number of fashions from HuggingFace Hub. Essentially the most most popular mannequin on this case is the 7B mannequin.
  • LLamaCPP is used when you’ll want to run Giant Language fashions on the CPU. At the moment, LlamaCPP is just supported with gguf mannequin information.
  • It is strongly recommended to comply with the immediate template to run the predict() technique on the consumer question.

Reference

Continuously Requested Questions

Q1. How do you utilize Hugging Face fashions with LangChain?

A. There are a number of approaches to leveraging open-source fashions from transformers inside Langchain. Firstly, you possibly can make the most of the Transformers Pipeline with HuggingFacePipelines. Moreover, you might have the choice to make use of HuggingFaceHub, from free inference and LlamaCPP. One non-obligatory method can also be utilizing HuggingFaceInferenceEndpoint, which isn’t free.

Q2. Is Hugging Face LLM free?

A. Sure, the Giant Language fashions obtainable on the HuggingFace are open-source and accessible. They are often accessed with the Transformers framework. Nonetheless, if you’ll want to host your LLMs on the HuggingFace cloud, you have to pay per hour primarily based on the InferenceEndpoint you select.

Q3. What fashions are appropriate with LangChain?

A. LangChain is a strong LLM framework extensively used for Retrieval Augmented Era. LangChain is appropriate with varied giant language fashions, comparable to GPT 4, Transformers Open Supply Fashions(LLama2, Zephyr, Mistral, Falcon), PaLM, Anyscale, and Cohere.

Q4. What’s the distinction between LangChain and Hugging Face?

A. LangChain is a Giant Language mannequin that helps varied parts, with LLMs being one in every of them. Nevertheless it doesn’t retailer or host any LLMs, whereas Transformers is a core deep-learning framework that hosts the mannequin and offers Areas to construct code demo functions.

The media proven on this article will not be owned by Analytics Vidhya and is used on the Writer’s discretion.



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments