Reply
Frequent Visitor
Posts: 7
Registered: ‎07-10-2018

DistinctCount Approximation or Exact

I'm connected to a Kusto cluster using DirectQuery and noticed that when I used DistinctCount to create a measure and visualize said measure, the results are sometimes off by +/- 1-2%. When I use visual filters to manually look at top/bottom n values, the actual number of distinct entries match the query results I get from Kusto.

 

Just wondering if the results from the distinctcount function is exact or approximate and if it's normal to expect some slight margin of error when using aggregate function on larger data sets through directquery.

Super User
Posts: 9,273
Registered: ‎07-11-2015

Re: DistinctCount Approximation or Exact

Do you get the little yellow ! in your visual that indicates that data is being sampled?


Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


Frequent Visitor
Posts: 7
Registered: ‎07-10-2018

Re: DistinctCount Approximation or Exact

Nope, I do not see the yellow exclamation mark.

 

I have 3 tables in a page, one that shows the measure, which is just DISTINCTCOUNT(column), and two other columns to slice the data. The other two have the unsummarized data. If I create visual filters on the two unsummarized one and sort them in reverse order to show the top 1 and bottom N, it does not match the value that DISTINCTCOUNT returns, but does match my Kusto query.

 

For example, the distinctcount function returns 1832, but if I create visual filters showing the top 1 and bottom 1832, the values do not match. However, if I create visual filters showing the top 1 asc and bottom 1853 desc (the Kusto query result), then the values match.

 

 

Super User
Posts: 9,273
Registered: ‎07-11-2015

Re: DistinctCount Approximation or Exact

Hmm, really tough to troubleshoot that one without data. But, if it is a large data set or has sensitive data, that can be difficult. Approximately how many rows are in your data set?


Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


Frequent Visitor
Posts: 7
Registered: ‎07-10-2018

Re: DistinctCount Approximation or Exact

I've attached a screenshot visualizing the issue. The first row shows 1832 distinct values, but my Kusto query returns 1852. As well, If I do a top 1 desc and bottom 1832 desc, the values do not match, but I instead do a bottom 1852 desc, the values match.

ScreenCapture.png

 

Total dataset has ~1.3M rows. Count of the first row of data returns a bit under 9000 rows.

Super User
Posts: 9,273
Registered: ‎07-11-2015

Re: DistinctCount Approximation or Exact

OK, looks like it text. The documentation on DISTINCTCOUNT isn't very detailed. I wonder if it is possible that DISTINCTCOUNT is not counting on something distinct that your other process is. like "ff" and "FF" might be the same or possibly trailing characters, etc. Have you done a clean and trim on that when you import it? Or is this live? I can't remember.


Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


Frequent Visitor
Posts: 7
Registered: ‎07-10-2018

Re: DistinctCount Approximation or Exact

Yea, I originally thought that too, but sometimes PowerBI shows more and other times it shows less. The data is cleaned and trimmed and I'm using DirectQuery instead of importing since the data source is updated on a regular basis and expected to grow quite large over time.

New Contributor
Posts: 444
Registered: ‎02-15-2018

Re: DistinctCount Approximation or Exact

Does not really address the issue why DISTINCTCOUNT is not functioning the way you want but you may want to look at using COUNTROWS(VALUES(table[field])) as an alternate way to get the nubmer of unique values.  I believe this is also more efficient in terms of performance espeically with large data sets. 

 

If this returns the same as distinctcount then the issue is probably with filter context somewhere that is filtering out some of the data points.  You also may want to check your query to ensrue there are no errors in any calculated columns dropping records.

Frequent Visitor
Posts: 7
Registered: ‎07-10-2018

Re: DistinctCount Approximation or Exact

I tried COUNTROWS(VALUES(Column)) and it gives the exact same number as DISTINCTCOUNT(Column).

 

I've removed all other fields with the exception of the date range slicer and using the same method as previously described, the DISTINCTCOUNT(Column) gives 9117 rows and the visual filters of the tables with the column has a total of 9085 rows (same as Kusto query result). 

 

There are no report level filters (only using visual level filters and a few page level filters on other pages).

New Contributor
Posts: 444
Registered: ‎02-15-2018

Re: DistinctCount Approximation or Exact

Do you have any joins to other tables? If there is a bridge or lookup table that does not contain all the options it probably is excluding those records that don’t match up.