Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I'm using the violin plot and I like it a lot.
https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104381947?tab=Overview
However, I see that the density plot extends well beyond the actual range of the data points.
That is not something I've experienced with other violin plots, but, perhaps I do not understand the purpose of this?
Thanks!
Solved! Go to Solution.
Hi there (and thanks for liking the visual!),
I got an email from someone with a similar question around the same time as this post and we've discussed offline. I assume you're this person, but I'll fill this out for anyone else who might come across the question and wonder the same thing.
Firstly, there's a good post on Stack Overflow that explains the issue. Whilst this question is for the Seaborn library, the concepts still apply.
The run-off is due to the Kernel Density Estimation (KDE) plot used to smooth your distribution. If we just stop at the end of the min/max, we run the risk of miscommunicating the modality of your data, so the KDE is projected outwards, based on the trajectory of your data to a convergence point. Sometimes, the KDE doesn't fully resolve to this point due to floating-point issues in Javascript and we choose a sensible cut-off point to stop. Sometimes this produces a straighter line than intended in the tail-off but still lets the halves converge (I'm continually looking into this).
Some other things to consider (bearing in mind that everyone's data is going to be specific to their individual use cases):
For example, here's the tooth-growth dataset with the default bandwidth across all categories (this gives a bandwidth of 7.9):
If I apply this by category, this will calculate bandwidths of 4.8, 5.69 and 4.11 respectively, e.g.:
You can see this looks a little better for this particular use case, but I'd still consider what tihs might do for a different set of data if I'm splitting into categories.
If I really want to tighten-up the chart, I can reduce the bandwidth for all categories to 1, e.g.:
So, my plots converge a little closer to the ends, but it's harder (but not impossible) to discern the modality of each category. For visuals with more data points (these only have 20 or so in them for each category), the plot can get a bit busy and may not serve the story you're trying to tell.
I have considered a 'clamping' option but have chosen not to implement at this time. I've also had this issue raised today, which I assume has sprung from this post/email discussion. I'll take a look at and consider for a future version as well.
Anyway, I hope that this clarifies things a bit and possibly offers some additional options for anyone using the visual.
Proud to be a Super User!
My course: Introduction to Developing Power BI Visuals
On how to ask a technical question, if you really want an answer (courtesy of SQLBI)
Hi there (and thanks for liking the visual!),
I got an email from someone with a similar question around the same time as this post and we've discussed offline. I assume you're this person, but I'll fill this out for anyone else who might come across the question and wonder the same thing.
Firstly, there's a good post on Stack Overflow that explains the issue. Whilst this question is for the Seaborn library, the concepts still apply.
The run-off is due to the Kernel Density Estimation (KDE) plot used to smooth your distribution. If we just stop at the end of the min/max, we run the risk of miscommunicating the modality of your data, so the KDE is projected outwards, based on the trajectory of your data to a convergence point. Sometimes, the KDE doesn't fully resolve to this point due to floating-point issues in Javascript and we choose a sensible cut-off point to stop. Sometimes this produces a straighter line than intended in the tail-off but still lets the halves converge (I'm continually looking into this).
Some other things to consider (bearing in mind that everyone's data is going to be specific to their individual use cases):
For example, here's the tooth-growth dataset with the default bandwidth across all categories (this gives a bandwidth of 7.9):
If I apply this by category, this will calculate bandwidths of 4.8, 5.69 and 4.11 respectively, e.g.:
You can see this looks a little better for this particular use case, but I'd still consider what tihs might do for a different set of data if I'm splitting into categories.
If I really want to tighten-up the chart, I can reduce the bandwidth for all categories to 1, e.g.:
So, my plots converge a little closer to the ends, but it's harder (but not impossible) to discern the modality of each category. For visuals with more data points (these only have 20 or so in them for each category), the plot can get a bit busy and may not serve the story you're trying to tell.
I have considered a 'clamping' option but have chosen not to implement at this time. I've also had this issue raised today, which I assume has sprung from this post/email discussion. I'll take a look at and consider for a future version as well.
Anyway, I hope that this clarifies things a bit and possibly offers some additional options for anyone using the visual.
Proud to be a Super User!
My course: Introduction to Developing Power BI Visuals
On how to ask a technical question, if you really want an answer (courtesy of SQLBI)
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
112 | |
97 | |
84 | |
67 | |
60 |
User | Count |
---|---|
150 | |
120 | |
99 | |
87 | |
68 |