Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Galaxy999
Regular Visitor

WordCloud limitations

Hello everybody,

I have a dashboard, on Power BI Desktop, containing a WordCloud 2.0.0 visual.

The WordCloud is associated to a text column of a table.

The column contains free textual data such as: "word", "word1 word2", "word1 word2 word3", etc., already deprived of punctuation. Many of these strings are very long (hundreds or thousands of characters).

 

The WordCloud is apparently working, displaying many smaller and bigger words and, when I hover over a word, a tooltip with the number of occurrences of the word in all the cells of the text column appears.

 

But:

1) the WordCloud displays a "Too Many Values..." warning

2) the number of occurrences displayed when I hover over a word in the WordCloud is obviously wrong and much much smaller than what it should be (in the order of, at maximum, thousands of occurrences, whereas I know for sure that the word occurs at least tens or hundreds of thousands of times)

 

Consider that:

- The table containing the text column contains ONE MILLION ROWS, more or less

- I made some experiments and, if I cut the table from one million rows down to 2 thousand rows, than the WordCloud does not display the "Too many values..." warning anymore and the occurrences displayed when hovering over words become correct (at 4 thousands rows the WordCloud starts emitting the "Too many values..." warning)

 

My understanding is that the WordCloud is taking into consideration just a tiny fraction of the original one million rows in order to keep rendering fast.

 

Is there any way of increasing the number of rows that the WordCloud considers? I have tried to play with the Maximum Number of Words parameter of the WordCloud, but nothing changes in terms of the number of occurrences displayed by the WordCloud.

 

If there is not a way to increase the number of rows that the WordCloud considers, then how does the WordCloud choose the rows to keep and the ones to discard? Does it keep the first n rows? How many? Does it take a random sample?

 

I tried to dig into the source code of the WordCloud, which is hosted on GitHub, but it is not a simple effort for me...

 

Many thanks for your help

 

 

1 ACCEPTED SOLUTION
amitchandak
Super User
Super User

@Galaxy999 , You can change the limit

 

 

But it may not exceed these limit

https://learn.microsoft.com/en-us/power-bi/visuals/power-bi-data-points

 

You can reduce it using Rank as visual level filter or use R/Python visual

View solution in original post

3 REPLIES 3
amitchandak
Super User
Super User

@Galaxy999 , You can change the limit

 

 

But it may not exceed these limit

https://learn.microsoft.com/en-us/power-bi/visuals/power-bi-data-points

 

You can reduce it using Rank as visual level filter or use R/Python visual

Hi amitchandak,

thank you for your response.

 

Regarding the limit of maximum number of records that the Word Cloud can take as input, I found that it is 2500. 

I found it in the source code of the Word Cloud, which is hosted on GitHub: https://github.com/microsoft/PowerBI-visuals-WordCloud/blob/main/capabilities.json.

Galaxy999_0-1678111927645.png

... I think this means that, when more than 2500 records are given to the Word Cloud, then only a sample (*) of 2500 of the given records are actually passed to the Word Cloud: the rest are discarded.

This parameter is not configurable without recompiling/rebuilding the Word Cloud visual from source code.

 

If I understood well, you also suggested that I can reduce the max number of records limit using Rank as visual filter.

I tried by setting a filter on the Word Cloud visual (Filter type: Top N - Show items: Top <n of records> - By value: <record id>).

As soon as I set a <n of records> greater than 2500, the "Too many values" warning begins appearing on the Word Clout.

Indeed, what I need is increasing such limit, not decreasing it (the number of records that the Word Cloud is able to take as input).

 

I still have to try the Python/R visual road...

 

Regards

 

(*) should not be a "sample", but the "top" 2500, as stated in the capabilities.json.

But, after some experiments, I found evidence that Power BI passes a sample of 30000 records to the Word Cloud, out of the total number of records currently filtered.

Then, the Word Cloud supposedly considers the top 2500 of these 30000 sampled records.

So, the final effect is that a sample of 2500 of the total records that should be represented by the Word Cloud is actually processed and displayed by the Word Cloud itself.

 

 

Hello,

I tried the Python/R pathway and could, partially, overcome the limitations of the "standard" Word Cloud visual.

 

First of all, I tried to use a Py visual, with the wordcloud library, but was forced to surrender after many hours of struggling with libraries that would not install due to various and quite obscure errors...

 

Then I tried with an R visual, and, being a newbie in R world, I found that I could quite easily get some encouraging results.

I borrowed the R code presented here: https://community.powerbi.com/t5/R-Script-Showcase/Wordcloud-with-r/td-p/60314.

So I now have a new R word cloud visual that is less nice, perhaps renders a bit slower, and works only on the desktop version of Power BI, but is not constrained by the 2500 record limit of the standard Power BI Word Cloud.

Note that, as far as I could experiment, Power BI still limits the amount of records that it passes to any R visual to 150 thousands.

 

As an added feature, I also inserted into my report another R visual that displays a visual leaderboard of the most frequent words contained in the currently selected records (same information displayed by part of the word cloud, but represented in a different way).

I borrowed the code from here: https://www.codementor.io/@alexander-k/r-word-frequency-in-dataframe-165jgfxxqe.

 

Helpful resources

Announcements
PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.