Constructing an AI Storyteller Utility

December 25, 2023

1

Introduction

With current AI developments equivalent to LangChain, ChatGPT builder, and the prominence of Hugging Face, creating AI and LLM apps has change into extra accessible. Nonetheless, many are not sure methods to leverage these instruments successfully.

On this article, I’ll information you in constructing an AI storyteller software that generates tales from random photographs. Using open-source LLM fashions and customized prompts with an industry-standard strategy, we’ll discover the step-by-step course of.

Earlier than we start, let’s set up expectations for this informative journey.

Studying Goal

Create your individual OpenAI and Hugging Face account and generate API keys.
Leverage the facility of open-source LLM fashions utilizing API’s.
Safeguard your undertaking secrets and techniques.
Decompose complicated initiatives into manageable duties and create undertaking workflow.
Give customized directions to LLMs utilizing the Lang-Chain module.
Create a easy internet interface for demonstration functions.
Respect the extent of element that goes into the event of LLM initiatives within the {industry}

Prerequistes

Earlier than shifting forward listed here are just a few pre-requires that’s should be fulfilled:

Python – Set up Python >=3.8, it’s possible you’ll face points in just a few steps.

Mini Conda – Elective, solely choose in the event you favor to work in an remoted atmosphere

VS Code – Light-weight IDE with a number of language help.

So, assuming you will have met all of the pre-requirements, let’s get began by understanding the undertaking workflow of our AI Storyteller software.

This text was printed as part of the Information Science Blogathon.

AI Storyteller Utility Workflow

Like every software program firm, let’s begin with the event of a undertaking define.

Right here is the desk of issues we have to do together with the strategy & supplier

Part Title Strategy Supplier

Picture Add Picture add internet interface Python Lib

Convert picture to textual content LLM Fashions (img2text) Hugging Face

Generate a narrative from textual content ChatGPT Open AI

Convert the story to audio LLM Mannequin (text2speech) Hugging Face

Person listens to audio Audio interface Python Lib

Demonstration Internet Interface Python Lib

If you’re nonetheless unclear here’s a high-level user-flow picture 👇

So having outlined the workflow, let’s begin by organizing undertaking recordsdata.

Set-up Workforce

Go to command immediate in working listing and enter this command one after the other:

mkdir ai-project cd ai-project code

When you run the final command it is going to open the VS code and create a workspace for you. We will likely be working on this workspace.

Alternatively, you may create the ai-project folder and open it inside vs code. The selection is yours 😅.

Now contained in the .env file create 2 fixed variables as:

HUGGINGFACEHUB_API_TOKEN = YOUR HUGGINGFACE API KEY OPENAI_API_KEY = YOUR OPEN AI API KEY

Now let’s fill within the values.

GET OpenAI API Key

Open AI permits builders to make use of API keys for interacting with their merchandise, so let’s now seize one for ourselves.

Go to the open-ai official web site and click on Login / Signup.

Subsequent, fill in your credentials and log in/join. If you happen to signed up, simply redo this step.

As soon as you might be logged in, you may be greeted with 2 choices – ChatGPT or API, choose API

On the following web page navigate to the lock 🔒 image (may differ on the time of studying) and click on it sidebar (seek advice from open-ai.png).

A brand new web page will seem on the sidebar (RHS). Now click on on Create a brand new secret key.

Title your key and hit create a secret key.

Necessary! – Word down this textual content/ worth, and maintain it protected. As soon as the popup closes you gained’t be capable of see it once more.

Now go to the .env file and paste it beside OPEN_AI_API_KEY. Don’t put any quotes (“”).

Now let’s repair the opposite one!

GET Hugging Face API Key

Hugging Face is an AI neighborhood that gives open-source fashions, datasets, duties, and even computing areas for a developer’s use case. The one catch is, that you’ll want to use their API to make use of the fashions. Right here is methods to get one (seek advice from ref.png for reference):

Head over to the hugging face web site and create an account/login.

Now head to the highest left avatar (🧑‍🦲) and click on settings in dropdown

Contained in the settings web page click on on Entry Tokens after which New Token.

Fill within the token information like identify and permission. Hold the identify descriptive and permission to learn.

Click on on Generate a token and voila you will have it. Be certain that to repeat it.

Open .env file and paste the copied id beside HUGGINGFACEHUB_API_TOKEN. Observe the rules as above.

So why can we require this? It’s because as a developer, it’s pure to by chance reveal secret information on our system. If another person will get maintain of this information it may be disastrous, so it’s a regular observe to separate the env recordsdata and later entry them in one other script.

As of now, we’re finished with the workspace setup, however there’s one non-compulsory step.

Create Atmosphere

This step is non-compulsory, so it’s possible you’ll skip it however it’s most well-liked to not!

Usually one must isolate their improvement area to concentrate on modules and recordsdata wanted for the undertaking. That is finished via making a digital atmosphere.

You need to use Mini-Conda to create the v-env as a consequence of its ease of use. So, open up a command immediate and sort the next instructions one after the opposite:

conda create ai-storyteller conda activate ai-storyteller

1st command creates a brand new digital atmosphere, whereas 2nd prompts that. This strategy even helps later on the undertaking deployment stage. Now let’s head to the primary undertaking improvement.

AI Storyteller Utility – Backend

As talked about beforehand, we’ll work out every part individually after which merge all of them.

Dependencies & Necessities

Within the vs-code or current-working-directory, create a brand new python file foremost.py. This can function the entry level for the undertaking. Now let’s import all of the required libraries:

from dotenv import find_dotenv, load_dotenv from transformers import pipeline from langchain import PromptTemplate, LLMChain, OpenAI import requests import os import streamlit as st

Don’t get into library particulars, we will likely be studying them, as we use go alongside.

load_dotenv(find_dotenv()) HUGGINGFACE_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

Right here:

In line 1, we first discover the .env file after which load its content material. This technique is used to load the OpenAI key however discourages its existence. Name an excellent observe 😅

In line 2, we load the Hugging face hub api token, saved in .env file utilizing os.getenv() to make use of in a while.

NOTE: Each the variables are fixed, so we stored it capital.

Having loaded all the necessities and dependencies, let’s transfer to constructing out the first part. Picture to textual content generator.

Picture To Textual content Generator Mannequin

#img-to-text def img2text(path): img_to_text = pipeline( "image-to-text", mannequin="Salesforce/blip-image-captioning-base") textual content = img_to_text(path)[0]['generated_text'] return textual content

Now let’s dissect the code:

In line 3 we outline the img2text perform which takes the picture path.

In line 4 we instantiate the mannequin object as img_to_text utilizing the pipeline constructor from hugging face which takes in activity (img_to_text) and mannequin identify.

in line 6 it sends the picture path to the mannequin through an api name returns the generated textual content (key: worth) and will get saved within the textual content variable.

Lastly, we returned the textual content.

So easy, proper?

Subsequent, let’s cross on the textual content to the story generator.

Textual content to Story Generator Mannequin

For text-to-story technology, we’re going to use ChatGPT however you might be free to make use of another mannequin you want.

Moreover, we’ll use Lang-chain to supply a customized immediate template to mannequin to make it protected for each age to make use of. This may be achieved as:

def story_generator(state of affairs): template = """ You're an professional children story teller; You possibly can generate quick tales primarily based on a easy narrative Your story ought to be greater than 50 phrases. CONTEXT: {state of affairs} STORY: """ immediate = PromptTemplate(template=template, input_variables = ["scenario"]) story_llm = LLMChain(llm = OpenAI( model_name="gpt-3.5-turbo", temperature = 1), immediate=immediate, verbose=True) story = story_llm.predict(state of affairs=state of affairs) return story

Code Clarification

Let’s perceive the code:

In line 1 we outline the story generator perform which takes the state of affairs as an argument. Discover right here the state of affairs refers back to the story generated by the mannequin earlier

From strains 2 to 9 we outline our customized directions beneath the variable template with context because the state of affairs. That is the customized instruction talked about earlier within the part.

Subsequent, in line 10 we generate a immediate utilizing the cuddling face PromptTemplate class. It takes within the template (total textual content) and the customized context (right here state of affairs)

In line 11 we create an occasion of the chat-gpt-3.5-turbo mannequin utilizing LLMChain wrapper from lang-chain. The mannequin requires a mannequin identify, temperature (randomness in response), immediate (our customized immediate), and verbose (to show logs).

Now we name the mannequin utilizing the predict technique and cross the state of affairs in line 14. This returns a narrative primarily based on the context, saved within the story variable

Ultimately, we return the story to cross it to the final mannequin.

For individuals who are curious in regards to the Lang-Chain courses used:

Immediate Template is used to create a immediate primarily based on the template / the context offered. On this case, it specifies there’s additional context -scenario.

LLM-Chain is used to characterize a sequence of LLM fashions. In our case, it represents the OpenAI language mannequin with GPT 3.5 Turbo mannequin. In easy phrases, you may chain a number of LLMs collectively.

To study extra about Lang-chain and its options refer right here.

Now we have to convert the generated output to audio. Let’s take a look.

Textual content To Audio Mannequin

However this time relatively than loading the mannequin, we’ll use hugging-face inference API, to fetch the consequence. This protects the storage and compute prices. Right here is the code:

#text-to-speech (Hugging Face) def text2speech(msg): API_URL = "https://api-inference.huggingface.co/fashions/espnet/kan-bayashi_ljspeech_vits" headers = {"Authorization": f"Bearer {HUGGINGFACE_API_TOKEN}"} payloads = { "inputs" : msg } response = requests.submit(API_URL, headers=headers, json=payloads) with open('audio.flac','wb') as f: f.write(response.content material)

Code Clarification

Right here is the reason of the above code:

In line 1 we outline a perform text2speech whose job is to absorb the msg (the story generated from the earlier mannequin) and return the audio file.

Line 2 consists of API_URL, which holds the api end-point to name.

Subsequent, we offer the authorization and bearer token within the header. This will likely be offered as a header (authorization information) after we name the mannequin.

In line 5 we outline a payload dictionary (JSON format) that comprises the message (msg) we have to convert

In subsequent line posts request to mannequin is distributed together with header and JSON information. The returned response is saved within the response variable.

Word: The format for mannequin inferencing can fluctuate over the mannequin, so please seek advice from the top of the part.

Lastly, we save the audio recordsdata’ content material (response.content material) within the native system by writing the required response audio.flac. That is finished for content material security and non-compulsory.

Elective

In case you propose to decide on a distinct text-to-audio mannequin, you may get the inference particulars by visiting the fashions web page clicking on the drop-down arrow beside deploy, and deciding on the inference-API choice.

Congrats the backend half is now full, let’s check the working!

Test Backend Working

Now it’s an excellent time to check the mannequin. For this, we’ll cross within the picture and name all of the mannequin features. Copy – paste the code beneath:

state of affairs = img2text("img.jpeg") #text2image story = story_generator(state of affairs) # create a narrative text2speech(story) # convert generated textual content to audio

Right here img.jpeg is the picture file and is current in the identical listing as foremost.py.

Now go to your terminal and run foremost.py as:

python foremost.py

If all the things goes nicely you will notice an audio file in the identical listing as:

picture.png

If you happen to don’t discover the audio.flac file, please guarantee you will have added your api keys, have enough tokens, and have all the mandatory libraries put in together with FFmpeg.

Now that now we have finished creating the backend, which works, it’s time to create the frontend web site. Let’s transfer.

AI Storyteller Utility – Frontend

To make our entrance finish we’ll use streamlit library which gives easy-to-use reusable parts for constructing webpages from Python scripts, having a devoted cli too, and internet hosting. The whole lot wanted to host a small undertaking.

To get began, go to Streamlit and create an account – It’s free!

Now go to your terminal and set up the streamlit cli utilizing:

pip set up streamlit

As soon as finished, you might be good to go.

Now copy-paste the next code:

def foremost(): st.set_page_config(page_title = "AI story Teller", page_icon ="🤖") st.header("We flip photographs to story!") upload_file = st.file_uploader("Select a picture...", sort="jpg") #uploads picture if upload_file just isn't None: print(upload_file) binary_data = upload_file.getvalue() # save picture with open (upload_file.identify, 'wb') as f: f.write(binary_data) st.picture(upload_file, caption = "Picture Uploaded", use_column_width = True) # show picture state of affairs = img2text(upload_file.identify) #text2image story = story_generator(state of affairs) # create a narrative text2speech(story) # convert generated textual content to audio # show state of affairs and story with st.expander("state of affairs"): st.write(state of affairs) with st.expander("story"): st.write(story) # show the audio - folks can pay attention st.audio("audio.flac") # the primary if __name__ == "__main__": foremost()

Code Clarification

st.set_page_config: Units the web page configuration. Right here set the title and icon

st.header: Units the web page header part.

st.file_uploader: Add an add part to the webpage together with the offered textual content. Right here used to take photographs from the person.

st.picture: Shows the picture. As guessed reveals person uploaded picture.

st.expander: Add an expander (increase to see) part to the webpage. Right here we use it to retailer the state of affairs (picture caption) and story (caption to story). As soon as the person clicks on the expander, he/she will be able to see the generated textual content. Additionally, it gives good ui-experience.

st.write: Used for a number of functions, right here to write down expander texts.

st.audio: Provides an audio part to the webpage – person can use this to hearken to generated audio

Here’s what our perform does in a nutshell:

Our foremost perform creates a webpage that enables the person to add the picture, cross that to the mannequin, convert the picture to the caption, generate a narrative primarily based on it, and convert that story to audio that the person can hearken to. Other than that one also can view the generated caption and story and the audio file is saved within the native / hosted system.

Now to run your software, head over to the terminal and run:

streamlit run app.py

If all the things profitable, you’re going to get beneath response:

picture.png

Now head over to the Native URL and you’ll check the app.

Here’s a video which showcases methods to use the app:

Congrats on constructing your LLM- software powered by Hugging Face, OpenAI, and Lang chain. Now let’s summarize what you will have realized on this article.

Conclusion

That’s all, now we have learnt methods to construct frontend and backend of an AI Storyteller software!

We began by laying down the muse of the undertaking, then leveraged the facility of hugging face to make use of Open Supply LLM Fashions for the duty in hand, mixed open AI with lang-chain to provide customized context and later wrapped the whole software into an interactive internet app utilizing streamlit. We additionally utilized safety rules information alongside the undertaking.

Key Takeaways

Safe the person information utilizing. env and cargo the identical utilizing the Python dotenv bundle.

Break down initiatives into workable parts and set the atmosphere accordingly.

Mix a number of fashions as a superscript to get your work finished.

Use Lang chain to supply customized directions to the mannequin to scale back hallucination and safeguarding response utilizing PromptTemplate.

Use the Lang-Chain LLMChain class to mix, a number of fashions.

Inference to hugging-face fashions and retailer the consequence utilizing the inference API.

Construct webpages utilizing Streamlit’s declarative syntax.

I hope you loved constructing this AI storyteller software. Now put that into observe, I can’t wait to see what you all provide you with. Thanks for sticking to the top. Listed here are just a few sources to get you began.

Assets

The media proven on this article just isn’t owned by Analytics Vidhya and is used on the Writer’s discretion.

Associated

Part Title	Strategy	Supplier
Picture Add	Picture add internet interface	Python Lib
Convert picture to textual content	LLM Fashions (img2text)	Hugging Face
Generate a narrative from textual content	ChatGPT	Open AI
Convert the story to audio	LLM Mannequin (text2speech)	Hugging Face
Person listens to audio	Audio interface	Python Lib
Demonstration	Internet Interface	Python Lib

Supply hyperlink

Previous articlePressing: New Chrome Zero-Day Vulnerability Exploited within the Wild
Next articleThe 41 finest last-minute digital presents

RELATED ARTICLES

Big Data

AI is not about shortening sport improvement, however enhancing it

December 27, 2023

Big Data

AI Boosts Enterprise Productiveness with Open Supply Calendars

December 27, 2023

Big Data

Operational Analytics: Constructing Low-Latency Queries

December 27, 2023

Constructing an AI Storyteller Utility

Introduction

Studying Goal

Prerequistes

AI Storyteller Utility Workflow

Set-up Workforce

GET OpenAI API Key

GET Hugging Face API Key

Create Atmosphere

AI Storyteller Utility – Backend

Dependencies & Necessities

Picture To Textual content Generator Mannequin

Textual content to Story Generator Mannequin

Code Clarification

Textual content To Audio Mannequin

Code Clarification

Test Backend Working

AI Storyteller Utility – Frontend

Code Clarification

Conclusion

Key Takeaways

Assets

Associated

AI is not about shortening sport improvement, however enhancing it

AI Boosts Enterprise Productiveness with Open Supply Calendars

Operational Analytics: Constructing Low-Latency Queries

LEAVE A REPLY Cancel reply

Most Popular

Tele2 Estonia to Supply 5G Calling Playing cards From January 2024

Barbie, Israel-Gaza warfare and Titan amongst 2023’s most searched Google phrases

Amazon Prime Video will begin exhibiting adverts on January twenty ninth

GitHub warns customers to allow 2FA earlier than upcoming deadline

Recent Comments

ABOUT US

POPULAR POSTS

Tele2 Estonia to Supply 5G Calling Playing cards From January 2024

Barbie, Israel-Gaza warfare and Titan amongst 2023’s most searched Google phrases

Amazon Prime Video will begin exhibiting adverts on January twenty ninth

POPULAR CATEGORY