Strategic Listening: A Information to Python Social Media Evaluation

November 12, 2022

1

With a world penetration fee of 58.4%, social media offers a wealth of opinions, concepts, and discussions shared day by day. This knowledge affords wealthy insights into crucial and well-liked dialog matters amongst customers.

In advertising, social media evaluation can assist corporations perceive and leverage shopper conduct. Two widespread, sensible strategies are:

Subject modeling, which solutions the query, “What dialog matters do customers discuss?”
Sentiment evaluation, which solutions the query, “How positively or negatively are customers talking a few matter?”

On this article, we use Python for social media knowledge evaluation and exhibit find out how to collect very important market info, extract actionable suggestions, and establish the product options that matter most to shoppers.

To show the utility of social media evaluation, let’s carry out a product evaluation of varied smartwatches utilizing Reddit knowledge and Python. Python is a powerful selection for knowledge science initiatives, and it affords many libraries that facilitate the implementation of the machine studying (ML) and pure language processing (NLP) fashions that we’ll use.

This evaluation makes use of Reddit knowledge (versus knowledge from Twitter, Fb, or Instagram) as a result of Reddit is the second most trusted social media platform for information and knowledge, in accordance with the American Press Institute. As well as, Reddit’s subforum group produces “subreddits” the place customers advocate and criticize particular merchandise; its construction is good for product-centered knowledge evaluation.

First we use sentiment evaluation to check person opinions on well-liked smartwatch manufacturers to find which merchandise are seen most positively. Then, we use matter modeling to slim in on particular smartwatch attributes that customers incessantly talk about. Although our instance is particular, you may apply the identical evaluation to another services or products.

Making ready Pattern Reddit Information

The info set for this instance incorporates the title of the put up, the textual content of the put up, and the textual content of all feedback for the newest 100 posts made within the r/smartwatch subreddit. Our dataset incorporates the newest 100 full discussions of the product, together with customers’ experiences, suggestions about merchandise, and their professionals and cons.

To gather this info from Reddit, we are going to use PRAW, the Python Reddit API Wrapper. First, create a consumer ID and secret token on Reddit utilizing the OAuth2 information. Subsequent, observe the official PRAW tutorials on downloading put up feedback and getting put up URLs.

Sentiment Evaluation: Figuring out Main Merchandise

To establish main merchandise, we are able to look at the constructive and detrimental feedback customers make about sure manufacturers by making use of sentiment evaluation to our textual content corpus. Sentiment evaluation fashions are NLP instruments that categorize texts as constructive or detrimental based mostly on their phrases and phrases. There may be all kinds of doable fashions, starting from easy counters of constructive and detrimental phrases to deep neural networks.

We are going to use VADER for our instance, as a result of it’s designed to optimize outcomes for brief texts from social networks by utilizing lexicons and rule-based algorithms. In different phrases, VADER performs effectively on knowledge units just like the one we’re analyzing.

Use the Python ML pocket book of your selection (for instance, Jupyter) to research this knowledge set. We set up VADER utilizing pip:

pip set up vaderSentiment

First, we add three new columns to our knowledge set: the compound sentiment values for the put up title, put up textual content, and remark textual content. To do that, iterate over every textual content and apply VADER’s polarity_scores technique, which takes a string as enter and returns a dictionary with 4 scores: positivity, negativity, neutrality, and compound.

For our functions, we’ll use solely the compound rating—the general sentiment based mostly on the primary three scores, rated on a normalized scale from -1 to 1 inclusive, the place -1 is probably the most detrimental and 1 is probably the most constructive—to be able to characterize the sentiment of a textual content with a single numerical worth:

# Import VADER and pandas
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 
import pandas as pd

analyzer = SentimentIntensityAnalyzer()

# Load knowledge
knowledge = pd.read_json("./sample_data/knowledge.json", traces=True)

# Initialize lists to retailer sentiment values 
title_compound = []
text_compound = []
comment_text_compound = []

for title,textual content,comment_text in zip(knowledge.Title, knowledge.Textual content, knowledge.Comment_text):
    title_compound.append(analyzer.polarity_scores(title)["compound"])
    
    text_compound.append(analyzer.polarity_scores(textual content)["compound"])

    comment_text_compound.append(analyzer.polarity_scores(comment_text["compound"])

# Add the brand new columns with the sentiment    
knowledge["title_compound"] = title_compound
knowledge["text_compound"] = text_compound
knowledge["comment_text_compound"] = comment_text_compound

Subsequent, we wish to catalog the texts by product and model; this permits us to find out the sentiment scores related to particular smartwatches. To do that, we designate an inventory of product traces we wish to analyze, then we confirm which merchandise are talked about in every textual content:

list_of_products = ["samsung", "apple", "xiaomi", "huawei", "amazfit", "oneplus"]

for column in ["Title","Text","Comment_text"]:
    for product in list_of_products:
        l = []
        for textual content in knowledge[column]:
            l.append(product in textual content.decrease())
        knowledge["{}_{}".format(column,product)] = l

Sure texts could point out a number of merchandise (for instance, a single remark would possibly examine two smartwatches). We are able to proceed in one among two methods:

We are able to discard these texts.
We are able to cut up these texts utilizing NLP strategies. (On this case, we might assign part of the textual content to every product.)

For the sake of code readability and ease, our evaluation discards these texts.

Sentiment Evaluation Outcomes

Now we’re in a position to look at our knowledge and decide the typical sentiment related to varied smartwatch manufacturers, as expressed by customers:

for product in list_of_products:
    imply = pd.concat([data[data["Title_{}".format(product)]].title_compound,
                      knowledge[data["Text_{}".format(product)]].text_compound,
                      knowledge[data["Comment_text_{}".format(product)]].comment_text_compound]).imply()
    print("{}: {})".format(product,imply))

We observe the next outcomes:

Smartwatch	Samsung	Apple	Xiaomi	Huawei	Amazfit	OnePlus
Sentiment Compound Rating (Avg.)	0.4939	0.5349	0.6462	0.4304	0.3978	0.8413

Our evaluation reveals helpful market info. For instance, customers from our knowledge set have a extra constructive sentiment concerning the OnePlus smartwatch over the opposite smartwatches.

Past contemplating common sentiment, companies must also think about the elements affecting these scores: What do customers love or hate about every model? We are able to use matter modeling to dive deeper into our current evaluation and produce actionable suggestions on services and products.

Subject Modeling: Discovering Vital Product Attributes

Subject modeling is the department of NLP that makes use of ML fashions to mathematically describe what a textual content is about. We are going to restrict the scope of our dialogue to classical NLP matter modeling approaches, although there are current advances going down utilizing transformers, resembling BERTopic.

There are various matter modeling algorithms, together with non-negative matrix factorization (NMF), sparse principal elements evaluation (sparse PCA), and latent dirichlet allocation (LDA). These ML fashions use a matrix as enter then cut back the dimensionality of the information. The enter matrix is structured such that:

Every column represents a phrase.
Every row represents a textual content.
Every cell represents the frequency of every phrase in every textual content.

These are all unsupervised fashions that can be utilized for matter decomposition. The NMF mannequin is usually used for social media evaluation, and is the one we are going to use for our instance, as a result of it permits us to acquire simply interpretable outcomes. It produces an output matrix such that:

Every column represents a subject.
Every row represents a textual content.
Every cell represents the diploma to which a textual content discusses a selected matter.

Our workflow follows this course of:

A green box labeled “Start topic modeling analysis” points right to a dark blue box: “Identify and import dependencies.” This box points right to a second dark blue box: “Create corpus of texts.” This box points right to a third dark blue box: “Apply NMF model.” This box points right to a fourth dark blue box: “Analyze results.” This box points right to a green box labeled “Integrate results into marketing,” and down to two light blue boxes: “General analysis” and “Detailed (sentiment-based) analysis.” — The Subject Modeling Course of

First, we’ll apply our NMF mannequin to research normal matters of curiosity, after which we’ll slim in on constructive and detrimental matters.

Analyzing Common Subjects of Curiosity

We’ll take a look at matters for the OnePlus smartwatch, because it had the best compound sentiment rating. Let’s import the required packages offering NMF performance and customary cease phrases to filter from our textual content:

from sklearn.feature_extraction.textual content import CountVectorizer
from sklearn.feature_extraction.textual content import TfidfTransformer
from sklearn.decomposition import NMF

import nltk
nltk.obtain('stopwords')
from nltk.corpus import stopwords

Now, let’s create an inventory with the corpus of texts we are going to use. We use the scikit-learn ML library’s CountVectorizer and TfidfTransformer capabilities to generate our enter matrix:

product = "oneplus"
corpus = pd.concat([data[data["Title_{}".format(product)]].Title,
                      knowledge[data["Text_{}".format(product)]].Textual content,
                      knowledge[data["Comment_text_{}".format(product)]].Comment_text]).tolist()

count_vect = CountVectorizer(stop_words=stopwords.phrases('english'), lowercase=True)
x_counts = count_vect.fit_transform(corpus)

feature_names = count_vect.get_feature_names_out()
tfidf_transformer = TfidfTransformer()
x_tfidf = tfidf_transformer.fit_transform(x_counts)

(Word that particulars about dealing with n-grams—i.e., different spellings and utilization resembling “one plus”—may be present in my earlier article on matter modeling.)

We’re prepared to use the NMF mannequin and discover the latent matters in our knowledge. Like different dimensionality discount strategies, NMF wants the entire variety of matters to be set as a parameter (dimension). Right here, we select a 10-topic dimensionality discount for simplicity, however you may take a look at totally different values to see what variety of matters yields the very best unsupervised studying outcome. Strive setting dimension to maximise metrics such because the silhouette coefficient or the elbow technique. We additionally set a random state for reproducibility:

import numpy as np

dimension = 10
nmf = NMF(n_components = dimension, random_state = 42)
nmf_array = nmf.fit_transform(x_tfidf)

elements = [nmf.components_[i] for i in vary(len(nmf.components_))]
options = count_vect.get_feature_names_out()
important_words = [sorted(features, key = lambda x: components[j][np.where(features==x)], reverse = True) for j in vary(len(elements))]

important_words incorporates lists of phrases, the place every record represents one matter and the phrases are ordered inside a subject by significance. It features a mixture of significant and “rubbish” matters; it is a widespread end in matter modeling as a result of it’s troublesome for the algorithm to efficiently cluster all texts into only a few matters.

Analyzing the important_words output, we discover significant matters round phrases like “finances” or “cost”, which factors to options that matter to customers when discussing OnePlus smartwatches:

['charge', 'battery', 'watch', 'best', 'range', 'days', 'life', 'android', 'bet', 'connectivity']
['budget', 'price', 'euros', 'buying', 'purchase', 'quality', 'tag', 'worth', 'smartwatch', '100']

Since our sentiment evaluation produced a excessive compound rating for OnePlus, we would assume that this implies it has a decrease price or higher battery life in comparison with different manufacturers. Nonetheless, at this level, we do not know whether or not customers view these elements positively or negatively, so let’s conduct an in-depth evaluation to get tangible solutions.

Analyzing Constructive and Adverse Subjects

Our extra detailed evaluation makes use of the identical ideas as our normal evaluation, utilized individually to constructive and detrimental texts. We are going to uncover which elements customers level to when talking positively—or negatively—a few product.

Let’s do that for the Samsung smartwatch. We are going to use the identical pipeline however with a distinct corpus:

We create an inventory of constructive texts which have a compound rating better than 0.8.
We create an inventory of detrimental texts which have a compound rating lower than 0.

These numbers had been chosen to pick out the highest 20% of constructive texts scores (>0.8) and prime 20% of detrimental texts scores (<0), and produce the strongest outcomes for our smartwatch sentiment evaluation:

# First the detrimental texts.
product = "samsung"
corpus_negative = pd.concat([data[(data["Title_{}".format(product)]) & (knowledge.title_compound < 0)].Title,
                      knowledge[(data["Text_{}".format(product)]) & (knowledge.text_compound < 0)].Textual content,
                      knowledge[(data["Comment_text_{}".format(product)]) & (knowledge.comment_text_compound < 0)].Comment_text]).tolist()


# Now the constructive texts.
corpus_positive = pd.concat([data[(data["Title_{}".format(product)]) & (knowledge.title_compound > 0.8)].Title,
                      knowledge[(data["Text_{}".format(product)]) & (knowledge.text_compound > 0.8)].Textual content,
                      knowledge[(data["Comment_text_{}".format(product)]) & (knowledge.comment_text_compound > 0.8)].Comment_text]).tolist()

print(corpus_negative)
print(corpus_positive)

We are able to repeat the identical technique of matter modeling that we used for normal matters of curiosity to disclose the constructive and detrimental matters. Our outcomes now present rather more particular advertising info: For instance, our mannequin’s detrimental corpus output features a matter in regards to the accuracy of burned energy, whereas the constructive output is about navigation/GPS and well being indicators like pulse fee and blood oxygen ranges. Lastly, we’ve got actionable suggestions on elements of the smartwatch that the customers love and areas the place the product has room for enchancment.

A word cloud with various words, from largest to smallest: health, pulse, screen, sensor, fitness, exercise, miles, feature, heart, active. — Phrase Cloud of a Samsung Constructive Subject, Created With the `wordcloud` Library

To amplify your knowledge findings, I would advocate making a phrase cloud or one other comparable visualization of the essential matters recognized in our tutorial.

By means of our evaluation, we perceive what customers consider a goal product and people of its rivals, what customers love about prime manufacturers, and what could also be improved for higher product design. Public social media knowledge evaluation lets you make knowledgeable selections concerning enterprise priorities and improve general person satisfaction. Incorporate social media evaluation into your subsequent product cycle for improved advertising campaigns and product design—as a result of listening is every little thing.

Additional Studying on the Toptal Engineering Weblog:

The editorial group of the Toptal Engineering Weblog extends its gratitude to Daniel Rubio for reviewing the code samples and different technical content material introduced on this article.

Supply hyperlink

Previous articleEU is near funding its personal satellite tv for pc web system

Next articlePrime Agile Software program Improvement Metrics

Strategic Listening: A Information to Python Social Media Evaluation

Making ready Pattern Reddit Information

Sentiment Evaluation: Figuring out Main Merchandise

Sentiment Evaluation Outcomes

Subject Modeling: Discovering Vital Product Attributes

Analyzing Common Subjects of Curiosity

Analyzing Constructive and Adverse Subjects

Additional Studying on the Toptal Engineering Weblog:

Episode 537: Adam Warski on Scala and Tapir : Software program Engineering Radio

Accessing Information at Scale with Justin Borgman

Earthly Builds with Adam Gordon Bell

LEAVE A REPLY Cancel reply

Most Popular

Open Supply Move Converter for Cell Wallets

Episode 537: Adam Warski on Scala and Tapir : Software program Engineering Radio

Cities, Supported Smartphones, Launch Timeline and Key Variations

How “Wordle editor” turned an actual job at The New York Occasions

Recent Comments

ABOUT US

POPULAR POSTS

Open Supply Move Converter for Cell Wallets

Episode 537: Adam Warski on Scala and Tapir : Software program Engineering Radio

Cities, Supported Smartphones, Launch Timeline and Key Variations

POPULAR CATEGORY