I'm connected to a Kusto cluster using DirectQuery and noticed that when I used DistinctCount to create a measure and visualize said measure, the results are sometimes off by +/- 1-2%. When I use visual filters to manually look at top/bottom n values, the actual number of distinct entries match the query results I get from Kusto.
Just wondering if the results from the distinctcount function is exact or approximate and if it's normal to expect some slight margin of error when using aggregate function on larger data sets through directquery.
Nope, I do not see the yellow exclamation mark.
I have 3 tables in a page, one that shows the measure, which is just DISTINCTCOUNT(column), and two other columns to slice the data. The other two have the unsummarized data. If I create visual filters on the two unsummarized one and sort them in reverse order to show the top 1 and bottom N, it does not match the value that DISTINCTCOUNT returns, but does match my Kusto query.
For example, the distinctcount function returns 1832, but if I create visual filters showing the top 1 and bottom 1832, the values do not match. However, if I create visual filters showing the top 1 asc and bottom 1853 desc (the Kusto query result), then the values match.
Hmm, really tough to troubleshoot that one without data. But, if it is a large data set or has sensitive data, that can be difficult. Approximately how many rows are in your data set?
Proud to be a Datanaut!
I've attached a screenshot visualizing the issue. The first row shows 1832 distinct values, but my Kusto query returns 1852. As well, If I do a top 1 desc and bottom 1832 desc, the values do not match, but I instead do a bottom 1852 desc, the values match.
Total dataset has ~1.3M rows. Count of the first row of data returns a bit under 9000 rows.
OK, looks like it text. The documentation on DISTINCTCOUNT isn't very detailed. I wonder if it is possible that DISTINCTCOUNT is not counting on something distinct that your other process is. like "ff" and "FF" might be the same or possibly trailing characters, etc. Have you done a clean and trim on that when you import it? Or is this live? I can't remember.
Proud to be a Datanaut!
Yea, I originally thought that too, but sometimes PowerBI shows more and other times it shows less. The data is cleaned and trimmed and I'm using DirectQuery instead of importing since the data source is updated on a regular basis and expected to grow quite large over time.
Does not really address the issue why DISTINCTCOUNT is not functioning the way you want but you may want to look at using COUNTROWS(VALUES(table[field])) as an alternate way to get the nubmer of unique values. I believe this is also more efficient in terms of performance espeically with large data sets.
If this returns the same as distinctcount then the issue is probably with filter context somewhere that is filtering out some of the data points. You also may want to check your query to ensrue there are no errors in any calculated columns dropping records.
I tried COUNTROWS(VALUES(Column)) and it gives the exact same number as DISTINCTCOUNT(Column).
I've removed all other fields with the exception of the date range slicer and using the same method as previously described, the DISTINCTCOUNT(Column) gives 9117 rows and the visual filters of the tables with the column has a total of 9085 rows (same as Kusto query result).
There are no report level filters (only using visual level filters and a few page level filters on other pages).