Wednesday, December 27, 2023
HomeBig DataConstructing an AI Storyteller Utility

Constructing an AI Storyteller Utility


Introduction

With current AI developments equivalent to LangChain, ChatGPT builder, and the prominence of Hugging Face, creating AI and LLM apps has change into extra accessible. Nonetheless, many are not sure methods to leverage these instruments successfully.

On this article, I’ll information you in constructing an AI storyteller software that generates tales from random photographs. Using open-source LLM fashions and customized prompts with an industry-standard strategy, we’ll discover the step-by-step course of.

Earlier than we start, let’s set up expectations for this informative journey.

Studying Goal

  • Create your individual OpenAI and Hugging Face account and generate API keys.
  • Leverage the facility of open-source LLM fashions utilizing API’s.
  • Safeguard your undertaking secrets and techniques.
  • Decompose complicated initiatives into manageable duties and create undertaking workflow.
  • Give customized directions to LLMs utilizing the Lang-Chain module.
  • Create a easy internet interface for demonstration functions.
  • Respect the extent of element that goes into the event of LLM initiatives within the {industry}

Prerequistes 

Earlier than shifting forward listed here are just a few pre-requires that’s should be fulfilled:

  • Python – Set up Python >=3.8, it’s possible you’ll face points in just a few steps.
  • Mini Conda – Elective, solely choose in the event you favor to work in an remoted atmosphere
  • VS Code – Light-weight IDE with a number of language help.

So, assuming you will have met all of the pre-requirements, let’s get began by understanding the undertaking workflow of our AI Storyteller software.

This text was printed as part of the Information Science Blogathon.

AI Storyteller Utility Workflow

Like every software program firm, let’s begin with the event of a undertaking define.

Right here is the desk of issues we have to do together with the strategy & supplier

Part Title Strategy Supplier
Picture Add Picture add internet interface Python Lib
Convert picture to textual content LLM Fashions (img2text) Hugging Face
Generate a narrative from textual content ChatGPT Open AI
Convert the story to audio LLM Mannequin (text2speech) Hugging Face
Person listens to audio Audio interface Python Lib
Demonstration Internet Interface Python Lib

If you’re nonetheless unclear here’s a high-level user-flow picture 👇

So having outlined the workflow, let’s begin by organizing undertaking recordsdata.

Set-up Workforce

Go to command immediate in working listing and enter this command one after the other:

mkdir ai-project
cd ai-project
code

When you run the final command it is going to open the VS code and create a workspace for you. We will likely be working on this workspace. 

Alternatively, you may create the ai-project folder and open it inside vs code. The selection is yours 😅.

Now contained in the .env file create 2 fixed variables as:

HUGGINGFACEHUB_API_TOKEN = YOUR HUGGINGFACE API KEY
OPENAI_API_KEY = YOUR OPEN AI API KEY

Now let’s fill within the values.

GET OpenAI API Key

Open AI permits builders to make use of API keys for interacting with their merchandise, so let’s now seize one for ourselves.

  • Go to the open-ai official web site and click on Login / Signup.
  • Subsequent, fill in your credentials and log in/join. If you happen to signed up, simply redo this step.
  • As soon as you might be logged in, you may be greeted with 2 choices – ChatGPT or API, choose API
  • On the following web page navigate to the lock 🔒 image (may differ on the time of studying) and click on it sidebar (seek advice from open-ai.png).
  • A brand new web page will seem on the sidebar (RHS). Now click on on Create a brand new secret key.
  • Title your key and hit create a secret key.
  • Necessary! – Word down this textual content/ worth, and maintain it protected. As soon as the popup closes you gained’t be capable of see it once more.
  • Now go to the .env file and paste it beside OPEN_AI_API_KEY. Don’t put any quotes (“”).
AI Storyteller Application

Now let’s repair the opposite one!

GET Hugging Face API Key

Hugging Face is an AI neighborhood that gives open-source fashions, datasets, duties, and even computing areas for a developer’s use case. The one catch is, that you’ll want to use their API to make use of the fashions. Right here is methods to get one (seek advice from ref.png for reference):

  • Head over to the hugging face web site and create an account/login.
  • Now head to the highest left avatar (🧑‍🦲) and click on settings in dropdown
  • Contained in the settings web page click on on Entry Tokens after which New Token.
  • Fill within the token information like identify and permission. Hold the identify descriptive and permission to learn.
  • Click on on Generate a token and voila you will have it. Be certain that to repeat it.
  • Open .env file and paste the copied id beside HUGGINGFACEHUB_API_TOKEN. Observe the rules as above.
GET Hugging Face API Key | AI Storyteller Application

So why can we require this? It’s because as a developer, it’s pure to by chance reveal secret information on our system. If another person will get maintain of this information it may be disastrous, so it’s a regular observe to separate the env recordsdata and later entry them in one other script.

As of now, we’re finished with the workspace setup, however there’s one non-compulsory step.

Create Atmosphere

This step is non-compulsory, so it’s possible you’ll skip it however it’s most well-liked to not!

Usually one must isolate their improvement area to concentrate on modules and recordsdata wanted for the undertaking. That is finished via making a digital atmosphere. 

You need to use Mini-Conda to create the v-env as a consequence of its ease of use. So, open up a command immediate and sort the next instructions one after the opposite:

conda create ai-storyteller
conda activate ai-storyteller

1st command creates a brand new digital atmosphere, whereas 2nd prompts that. This strategy even helps later on the undertaking deployment stage. Now let’s head to the primary undertaking improvement.

AI Storyteller Utility – Backend

As talked about beforehand, we’ll work out every part individually after which merge all of them.

Dependencies & Necessities

Within the vs-code or current-working-directory, create a brand new python file foremost.py. This can function the entry level for the undertaking. Now let’s import all of the required libraries:

from dotenv import find_dotenv, load_dotenv
from transformers import pipeline
from langchain import PromptTemplate, LLMChain, OpenAI
import requests
import os
import streamlit as st

Don’t get into library particulars, we will likely be studying them, as we use go alongside. 

load_dotenv(find_dotenv())
HUGGINGFACE_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

Right here:

  • In line 1, we first discover the .env file after which load its content material. This technique is used to load the OpenAI key however discourages its existence. Name an excellent observe 😅
  • In line 2, we load the Hugging face hub api token, saved in .env file utilizing os.getenv() to make use of in a while.
  • NOTE: Each the variables are fixed, so we stored it capital.

Having loaded all the necessities and dependencies, let’s transfer to constructing out the first part. Picture to textual content generator.

Picture To Textual content Generator Mannequin

#img-to-text

def img2text(path):
    img_to_text = pipeline(
    "image-to-text", mannequin="Salesforce/blip-image-captioning-base")
    textual content = img_to_text(path)[0]['generated_text']
    return textual content

Now let’s dissect the code:

  • In line 3 we outline the img2text perform which takes the picture path.
  • In line 4 we instantiate the mannequin object as img_to_text utilizing the pipeline constructor from hugging face which takes in activity (img_to_text) and mannequin identify.
  • in line 6 it sends the picture path to the mannequin through an api name returns the generated textual content (key: worth) and will get saved within the textual content variable.
  • Lastly, we returned the textual content.

So easy, proper?

Subsequent, let’s cross on the textual content to the story generator.

Textual content to Story Generator Mannequin

For text-to-story technology, we’re going to use ChatGPT however you might be free to make use of another mannequin you want.

Moreover, we’ll use Lang-chain to supply a customized immediate template to mannequin to make it protected for each age to make use of. This may be achieved as:

def story_generator(state of affairs):
    template = """
    You're an professional children story teller;
    You possibly can generate quick tales primarily based on a easy narrative
    Your story ought to be greater than 50 phrases.

    CONTEXT: {state of affairs}
    STORY:
    """
    immediate = PromptTemplate(template=template, input_variables = ["scenario"])
    story_llm = LLMChain(llm = OpenAI(
        model_name="gpt-3.5-turbo", temperature = 1), immediate=immediate, verbose=True)
    
    story = story_llm.predict(state of affairs=state of affairs)
    return story

Code Clarification

Let’s perceive the code:

  • In line 1 we outline the story generator perform which takes the state of affairs as an argument. Discover right here the state of affairs refers back to the story generated by the mannequin earlier
  • From strains 2 to 9 we outline our customized directions beneath the variable template with context because the state of affairs. That is the customized instruction talked about earlier within the part.
  • Subsequent, in line 10 we generate a immediate utilizing the cuddling face PromptTemplate class. It takes within the template (total textual content) and the customized context (right here state of affairs)
  • In line 11 we create an occasion of the chat-gpt-3.5-turbo mannequin utilizing LLMChain wrapper from lang-chain. The mannequin requires a mannequin identify, temperature (randomness in response), immediate (our customized immediate), and verbose (to show logs).
  • Now we name the mannequin utilizing the predict technique and cross the state of affairs in line 14. This returns a narrative primarily based on the context, saved within the story variable
  • Ultimately, we return the story to cross it to the final mannequin.

For individuals who are curious in regards to the Lang-Chain courses used:

  • Immediate Template is used to create a immediate primarily based on the template / the context offered. On this case, it specifies there’s additional context -scenario.
  • LLM-Chain is used to characterize a sequence of LLM fashions. In our case, it represents the OpenAI language mannequin with GPT 3.5 Turbo mannequin. In easy phrases, you may chain a number of LLMs collectively.

To study extra about Lang-chain and its options refer right here.

Now we have to convert the generated output to audio. Let’s take a look.

Textual content To Audio Mannequin

However this time relatively than loading the mannequin, we’ll use hugging-face inference API, to fetch the consequence. This protects the storage and compute prices. Right here is the code:

#text-to-speech (Hugging Face)
def text2speech(msg):
    API_URL = "https://api-inference.huggingface.co/fashions/espnet/kan-bayashi_ljspeech_vits"
    headers = {"Authorization": f"Bearer {HUGGINGFACE_API_TOKEN}"}
    payloads = {
         "inputs" : msg
    }
    response = requests.submit(API_URL, headers=headers, json=payloads)

    with open('audio.flac','wb') as f:
        f.write(response.content material)

Code Clarification

Right here is the reason of the above code:

  • In line 1 we outline a perform text2speech whose job is to absorb the msg (the story generated from the earlier mannequin) and return the audio file.
  • Line 2 consists of API_URL, which holds the api end-point to name.
  • Subsequent, we offer the authorization and bearer token within the header. This will likely be offered as a header (authorization information) after we name the mannequin.
  • In line 5 we outline a payload dictionary (JSON format) that comprises the message (msg) we have to convert
  • In subsequent line posts request to mannequin is distributed together with header and JSON information. The returned response is saved within the response variable.

Word: The format for mannequin inferencing can fluctuate over the mannequin, so please seek advice from the top of the part.

  • Lastly, we save the audio recordsdata’ content material (response.content material) within the native system by writing the required response audio.flac. That is finished for content material security and non-compulsory.

Elective

In case you propose to decide on a distinct text-to-audio mannequin, you may get the inference particulars by visiting the fashions web page clicking on the drop-down arrow beside deploy, and deciding on the inference-API choice.

Optional

Congrats the backend half is now full, let’s check the working!

Test Backend Working

Now it’s an excellent time to check the mannequin. For this, we’ll cross within the picture and name all of the mannequin features. Copy – paste the code beneath:

state of affairs = img2text("img.jpeg") #text2image
story = story_generator(state of affairs) # create a narrative
text2speech(story) # convert generated textual content to audio

Right here img.jpeg is the picture file and is current in the identical listing as foremost.py.

Now go to your terminal and run foremost.py as:

python foremost.py

If all the things goes nicely you will notice an audio file in the identical listing as:

audio file
picture.png

If you happen to don’t discover the audio.flac file, please guarantee you will have added your api keys, have enough tokens, and have all the mandatory libraries put in together with FFmpeg.

Now that now we have finished creating the backend, which works, it’s time to create the frontend web site. Let’s transfer.

AI Storyteller Utility – Frontend

To make our entrance finish we’ll use streamlit library which gives easy-to-use reusable parts for constructing webpages from Python scripts, having a devoted cli too, and internet hosting. The whole lot wanted to host a small undertaking. 

To get began, go to Streamlit and create an account – It’s free!

Now go to your terminal and set up the streamlit cli utilizing:

pip set up streamlit

As soon as finished, you might be good to go.

Now copy-paste the next code:

def foremost():
    st.set_page_config(page_title = "AI story Teller", page_icon ="🤖")

    st.header("We flip photographs to story!")
    upload_file = st.file_uploader("Select a picture...", sort="jpg")  #uploads picture

    if upload_file just isn't None:
        print(upload_file)
        binary_data = upload_file.getvalue()
        
        # save picture
        with open (upload_file.identify, 'wb') as f:
            f.write(binary_data)
        st.picture(upload_file, caption = "Picture Uploaded", use_column_width = True) # show picture

        state of affairs = img2text(upload_file.identify) #text2image
        story = story_generator(state of affairs) # create a narrative
        text2speech(story) # convert generated textual content to audio

        # show state of affairs and story
        with st.expander("state of affairs"):
            st.write(state of affairs)
        with st.expander("story"):
            st.write(story)
        
        # show the audio - folks can pay attention
        st.audio("audio.flac")

# the primary
if __name__ == "__main__":
    foremost()

Code Clarification

  • st.set_page_config: Units the web page configuration. Right here set the title and icon
  • st.header: Units the web page header part.
  • st.file_uploader: Add an add part to the webpage together with the offered textual content. Right here used to take photographs from the person.
  • st.picture: Shows the picture. As guessed reveals person uploaded picture.
  • st.expander: Add an expander (increase to see) part to the webpage. Right here we use it to retailer the state of affairs (picture caption) and story (caption to story). As soon as the person clicks on the expander, he/she will be able to see the generated textual content. Additionally, it gives good ui-experience.
  • st.write: Used for a number of functions, right here to write down expander texts.
  • st.audio: Provides an audio part to the webpage – person can use this to hearken to generated audio

Here’s what our perform does in a nutshell:

Our foremost perform creates a webpage that enables the person to add the picture, cross that to the mannequin, convert the picture to the caption, generate a narrative primarily based on it, and convert that story to audio that the person can hearken to. Other than that one also can view the generated caption and story and the audio file is saved within the native / hosted system.

Now to run your software, head over to the terminal and run:

streamlit run app.py

If all the things profitable, you’re going to get beneath response:

streamlit run app.py | AI Storyteller Application
picture.png

Now head over to the Native URL and you’ll check the app. 

Here’s a video which showcases methods to use the app:

 Congrats on constructing your LLM- software powered by Hugging Face, OpenAI, and Lang chain. Now let’s summarize what you will have realized on this article.

Conclusion

That’s all, now we have learnt methods to construct frontend and backend of an AI Storyteller software!

We began by laying down the muse of the undertaking, then leveraged the facility of hugging face to make use of Open Supply LLM Fashions for the duty in hand, mixed open AI with lang-chain to provide customized context and later wrapped the whole software into an interactive internet app utilizing streamlit. We additionally utilized safety rules information alongside the undertaking.

Key Takeaways

  • Safe the person information utilizing. env and cargo the identical utilizing the Python dotenv bundle.
  • Break down initiatives into workable parts and set the atmosphere accordingly.
  • Mix a number of fashions as a superscript to get your work finished.
  • Use Lang chain to supply customized directions to the mannequin to scale back hallucination and safeguarding response utilizing PromptTemplate.
  • Use the Lang-Chain LLMChain class to mix, a number of fashions.
  • Inference to hugging-face fashions and retailer the consequence utilizing the inference API.
  • Construct webpages utilizing Streamlit’s declarative syntax.

I hope you loved constructing this AI storyteller software. Now put that into observe, I can’t wait to see what you all provide you with. Thanks for sticking to the top. Listed here are just a few sources to get you began.

Assets

The media proven on this article just isn’t owned by Analytics Vidhya and is used on the Writer’s discretion.



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments