Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
jughenao
Regular Visitor

Bigram WordClouds in PowerBI

Hello everyone,

 

I'm currently using the WordCloud 1.6.0 module in PowerBI Desktop and I'm trying to do a bigram wordcloud.

 

What I mean by bigram wordcloud is a wordcloud that doesn't show the words by themselves, but instead shows the most frequent adjacent pairs of words. For example, instead of the wordcloud showing the words: "red", "wine", "the", "lord" and "amazing", I'd like the wordcloud to show the bigrams: "red wine" and "the lord" (assuming they have enough frequency).

 

I've tried to transform the sentences to bigrams with a separator, like the example below:

 

Original phrase: I like the red wine of the lord. It is amazing.

Bigrammed phrase: I-like like-the the-red red-wine wine-of of-the the-lord lord-. .-It It-is is-amazing.

 

And then try to use the wordcloud on the bigrammed phrases, but it didn't work. The wordcloud seems to identify the dash as a word separator and plots each word individually.

 

Anyone has any ideas or knows a way to do this? 

 

Thanks in advance,

 

-JM

1 ACCEPTED SOLUTION
v-huizhn-msft
Employee
Employee

Hi @jughenao,

For your requirement, we can not achieve it until now, please review the following ideas and vote them.

Improve the WordCloud visual
WordCloud -conditional formatting words

Thanks,
Angelia

View solution in original post

2 REPLIES 2
douglbar
New Member

Hi @jughenao @v-huizhn-msft 

 

I have created a workaround for using n-grams in Power BI wordcloud graph with help of python script.

 

Proceed as follows:

 

1. having your report open, go to Power Query

2. select the table with the column that holds the words you want to create the bigrams

3. click on Transform > Python Script (last button at right)

4. paste the code below:

 

***

# 'dataset' holds the input data for this script
from nltk.corpus import stopwords
from nltk import bigrams

 

stop_words = stopwords.words('english')

 

# filter stopwords out
dataset['words_filtered'] = dataset['yourcolumnname'].astype(str).apply(lambda x: ' '.join([word for word in x.split() if word.lower() not in (stop_words)]))

 

# remove punctuation
dataset['words_filtered'] = dataset['words_filtered'].str.replace('[^\w\s]','')

 

# generate bigrams
dataset['bigrams'] = dataset['words_filtered'].astype(str).apply(lambda row: ' '.join(['-'.join(item) for item in bigrams(row.split(' '))]))

 

# merge unigrams and bigrams
dataset['words_final'] = dataset[['words_filtered','bigrams']].agg(' '.join, axis=1)

 

# drop auxiliary columns
dataset.drop(['words_filtered','bigrams'], axis=1, inplace=True)

 

***

 

5. when you click OK, it will show a table with columns "Name" and "Value"

6. click on the button at right on "Value" column, select all columns and uncheck "use original column name as prefix" and click OK

7. it will re-generate your table with new column "words_final" at the end

8. click Close and Apply

9. use "words_final" as Category in the wordcloud Graph

10. you also have to go to Visualizations > Format Visual > Visual > General and turn ON "Special Characters"

 

You should be able to see both "unigrams" (i.e. the words as it originally was) as well as the bigrams. If you want only the bigrams, then you do not need the final part of the script, just use the "bigrams" column of the dataframe (do not merge and drop the columns).

 

Important: you have to have python, ntlk and ntlk.stopwords installed.

 

- to install python (if you do not have), I recommend you installing Anaconda

- to install ntlk, open a jupyter notebook and run the command: !pip install ntlk

- to download stopwords, run the command: ntlk.download('stopwords')

 

Hope this helps.

 

Regards 

v-huizhn-msft
Employee
Employee

Hi @jughenao,

For your requirement, we can not achieve it until now, please review the following ideas and vote them.

Improve the WordCloud visual
WordCloud -conditional formatting words

Thanks,
Angelia

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.