Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
BastiaanBrak
Helper IV
Helper IV

High density sampling (error?)

 

hello, I'm confused about the High density sampling (HDS) algorithm. I've got a Line Chart with 166 different series. The documentation about High Density (Line) Sampling indicates that the maximum number of series that can be displayed is 60. It then describes the process of how 60 representative series are selected if - as in my case - the actual total is higher.
https://docs.microsoft.com/en-us/power-bi/desktop-high-density-sampling

 

Specifically:


The algorithm creates as many bins as possible to create the greatest granularity for the visual. Within each bin, the algorithm finds the minimum and maximum data value, to ensure that important and significant values (for example, outliers) are captured and displayed in the visual.

 

Below are two screenshots, first one with HDS On, second with HDS Off:

 HD sampling ON.JPG

 HD sampling.JPG

 

Based on these screenshots, it appears as if HDS is being applied as indicated in documentation. However ... it turns out that - at least in my use case - outliers at the top end are not represented at all but left out altogether when HDS is On (I used targeted filtering to eliminate some series and leave the outliers in). 

HD sampling ON-outliers.JPG

I've tried getting my head round the information in the 'Considerations and limitations' section to understand if this is intended behaviour of HDS but am getting confused because of the points below, which appear to suggest the outliers are excluded because alphabetically they appear after the 60th series, but to me this would defeat the point of HDS altogether.

 

 

  • When the size of an overall data source is too big, the new algorithm eliminates series (legend elements) to accommodate the data import maximum constraint.
    • In this situation, the new algorithm orders legend series alphabetically, starts down the list of legend elements in alphabetical order until the data import maximum is reached, and does not import additional series.
  • When an underlying data set has more than 60 series (the maximum number of series, as described earlier), the new algorithm orders the series alphabetically, and eliminates series beyond the 60th alphabetically-ordered series.

 

In any case it seems to me this is undesirable behaviour from HDS but can anyone explain why the outliers are not included by HDS?

 

Many thanks, Bastiaan

5 REPLIES 5
v-yulgu-msft
Employee
Employee

Hi @BastiaanBrak,

 


In any case it seems to me this is undesirable behaviour from HDS but can anyone explain why the outliers are not included by HDS? 

What is your desired output? What outliners were you referring to?

 

Regards,

Yuliana Gu

 

 

Community Support Team _ Yuliana Gu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

hi Yuliana @v-yulgu-msft

 

Ideally, my desired output is for all 166 series to be visible. If that option is not available, I'd be content with what High Density Sampling is purported to do, i.e. "ensure that important and significant values (for example, outliers) are captured and displayed in the visual" but in my use case HDS does not work as described.

 

As you can see in the screenshot on the right (High Density Sampling = OFF), there are at five series, three in the pink area and two in the blue area, that are not present when High Density Sampling = ON (screenshot on left), at least three of which I would argue represent "important and significant values" since they represent faster rising series than the ones included in HDS.

 

HD sampling ON.JPG HD sampling ON-outliers.JPG

 

Thanks and hope this helps, Bastiaan

Anyone??

Update: you can see the high density sampling error as described above in action in this web report:
https://ahdb.org.uk/bgmec

 

Specifically: the location in the south-west of England, which has been omitted from the graph by the high density sampling algorithm,  represents the time-series with the steepest slope (click the location on the map or select 'South West England' from the Region drop down to verify) so should NOT have been omitted.

Can you confirm this has been raised as a glitch now @v-yulgu-msft ? 

PowerBI capture.PNG

BastiaanBrak
Helper IV
Helper IV

Hope it's OK to tag in some of the contributors listed on the HDS documentation article?@DavidIseminger 
@lcasey

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.