Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I am creating a number of "health reports" for my client, where I visualize the completeness and accuracy of their source system data. We're starting with the simplest question: what % of rows in each field is null? To show this quickly, I'm setting a series of stacked column or bar charts, where each bar shows the ration of "healthy":"missing" rows in a given field.
From an optimization standpoint, I'm curious which of the following methods will get the best performance out of my dashboard, both in terms of query load speed and in terms of responsiveness to user interaction on the Service.
Method 1 - Added Logical Columns in Power Query
For each field I'm auditing, I run a function that adds another column to the dataset, e.g.:
fieldNull = Table.AddColumn(prevStep, "fieldName", each fnNullBool([field2Check]), type logical)
So if I have six fields I want to check for nulls, that means six extra columns. Even if they're all type logical, this adds to the weight of the dataset and forces these calculations at the Power Query level.
For the stacked bar chart, I set the fieldNull as the Legend, and the Count of Rows as the Value.
Method 2 - Measure
I load the dataset I'm checking for nulls "as is". For each field I want to check for nulls, I create two Measures:
field_missing = COUNTBLANK(field2Check)
field_healthy = [allTableRows] - [field_missing]
For the stacked bar chart, I place both Measures in the Value field.
---------------------------------------------------------------------------
From a user perspective, Method 1 might be preferable because they can click on the stacked bar chart and it willl auto-filter all other charts on the visual, plus any "full-details" tables that let them see the dataset beyond just null counts. But I'm concerned this method will make the dashboard muuuuuuch slower.
Thoughts from the back-end experts? What's the performance trade-off here, if any?
Solved! Go to Solution.
Hi @Anonymous,
From my point of view, if you're not using direct query mode for your dataset, Method 1 could be a better choice.
Regards
Hi @Anonymous,
From my point of view, if you're not using direct query mode for your dataset, Method 1 could be a better choice.
Regards
Forgot to explain the fnNullBool function:
(x) =>
let
nullCheck = Logical.From(if x = null then true else false)
in
nullCheck
User | Count |
---|---|
124 | |
108 | |
99 | |
62 | |
62 |
User | Count |
---|---|
137 | |
115 | |
102 | |
71 | |
61 |