Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I am trying to Optimize a query that I have that produces the correct results, however very slow.
What I want to do is to click on a visual for the 'Dimension' and then another visual shows the ZScores for the Values within that Dimension. (I have removed the part of my formula that makes this show no ZScores if not a single Dimension is selected just for demo purposes)
I have a table that contains multiple dimensions (circa 20+) that I want to get the ZScores for each one of those dimensions. My base table looks like this
Products Contacts STAFF REGION AFFLUENCE LIFESTAGE
5 10 Staff 1 Region1 Affluence1 Lifestage1
10 15 Staff 2 Region1 Affluence1 Lifestage2
6 50 Staff 2 Region2 Affluence2 Lifestage2
VAR AffluenceSimpleTable = ADDCOLUMNS( SUMMARIZE (biRecord, biRecord[Demographic Affluence]), "Encoded", TRIM(CONCATENATE("Affluence", CONCATENATE(" ", [Demographic Affluence]))), "Numerator" , [No Products], "Denominator", [No Contact] ) VAR AffluenceTable = ADDCOLUMNS(ADDCOLUMNS(AffluenceSimpleTable, "NumeratorTotal", CALCULATE([No Products], ALL(biRecord[Demographic Affluence])), "DenominatorTotal", CALCULATE([No Contact], ALL(biRecord[Demographic Affluence])) ), "IndexOfProportion", DIVIDE ( [Numerator], [NumeratorTotal]) - DIVIDE([Denominator], [DenominatorTotal]) ) VAR AffluenceTableZ = ADDCOLUMNS( ADDCOLUMNS(AffluenceTable, "Mean", AVERAGEX ( AffluenceTable, [IndexOfProportion] ), "StdDev", STDEVX.P ( AffluenceTable, [IndexOfProportion] ) ), "ZScore", DIVIDE (( [IndexOfProportion] - [Mean] ), [StdDev]) )
So each one of these tables calculate in approx 200ms (happy with that - however if someone sees a way to optimize the above further that would be useful) - I have this block of code for every single Dimension. The 'Encoded' value is what I am using to match this Dimension Detail in the below table.
I have another table that collates all the Dimension Details together that looks like this called 'DimensionTbl'...
DIMENSION DIMENSION DETAIL DIMENSIONENCODED
Staff Staff 1 Staff Staff 1
Staff Staff 2 Staff Staff 2
Region Region 1 Region Region 1
Region Region 2 Region Region 2
Lifestage Lifestage 1 Lifestage Lifestage 1
Lifestage Lifestage 2 Lifestage Lifestage 2
Affluence Affluence 1 Affluence Affluence 1
Affluence Affluence 2 Affluence Affluence 2
So far I can think of two approaches to get the relevant results against the 'Dimension Detail' table.
First version is this, which is to merge all of these dimension calculated variable tables together and then find the appropriate value within that - This means the more dimensions I add the slower it becomes, especially if a dimension has many values.
VAR megaTable = UNION(
AffluenceTableZ,
LifestageTableZ,
RegionTableZ,
StaffTableZ
)
RETURN SUMX('DimensionTbl', SUMX(megaTable, if([Encoded] = 'DimensionTbl'[DimensionEncoded], [ZScore], BLANK()))
The second version uses Switch statements to achieve the same thing....
RETURN
SWITCH(MAX('DimensionTbl'[Dimension]),
"Lifestage", MINX(LifestageTableZ, [ZScore]),
"Affluence", MINX(AffluenceTableZ, [ZScore]),
"Staff", MINX(StaffTableZ, [ZScore]),
"Region", MINX(RegionTableZ, [ZScore]),
)
Both of these approaches work to produce an accurate result, however both methods are slow.
A few questions...
1) Is there another approach that is more efficient
2) How do Switch statements process the calculation? If my Switch matches 'Staff' would that mean that all 3 tables (LifestageTableZ, AffluenceTableZ and StaffTableZ) are all calculated? Or does it just check if the value matches "Lifestage", "Affluence", "Staff", and then StaffTableZ? I have worked out that I should put the results I use most frequently in the Switch should be closer to the start of the Switch statement as it means if it finds a match it ignores all options below that in the Switch - (Hopefully that makes sense)
Thanks in advance, and let me know if you need anything clarified.
Solved! Go to Solution.
If anyone is interested if they come across a similar problem. This is a much more effecient version.
Make each Dimension as variables (This is One Dimension. I have duplicated this as variables for every dimension.
VAR SelectedDimension = [SelectedDimension] VAR OutcomeYearCategory = "Outcome Year" VAR OutcomeYearIndicator = IF(SelectedDimension = OutcomeYearCategory, 1, BLANK()) VAR OutcomeYearSimpleTable = ADDCOLUMNS(ADDCOLUMNS( VALUES('Dates Outcome'[Financial Year]), "Category", OutcomeYearCategory), "Encoded", TRIM(CONCATENATE([Category], CONCATENATE(" ", [Financial Year]))), "Numerator" , [No Products] * OutcomeYearIndicator , "Denominator", [No Contact] * OutcomeYearIndicator ) VAR OutcomeYearTable = ADDCOLUMNS(ADDCOLUMNS(OutcomeYearSimpleTable, "NumeratorTotal", CALCULATE([No Products] * OutcomeYearIndicator, ALL('Dates Outcome'[Financial Year])), "DenominatorTotal", CALCULATE([No Contact] * OutcomeYearIndicator, ALL('Dates Outcome'[Financial Year])) ), "IndexOfProportion", DIVIDE ( [Numerator], [NumeratorTotal]) - DIVIDE([Denominator], [DenominatorTotal]) ) VAR OutcomeYearTableZ = ADDCOLUMNS( ADDCOLUMNS(OutcomeYearTable, "Mean", AVERAGEX ( OutcomeYearTable, [IndexOfProportion] ), "StdDev", STDEVX.P ( OutcomeYearTable, [IndexOfProportion] ) ), "ZScore", DIVIDE (( [IndexOfProportion] - [Mean] ), [StdDev]) )
Then I pull it all together with a megaTable that holds every single dimension.
VAR megaTable = UNION(OutcomeYearTableZ, OutcomeQuarterTableZ) RETURN MINX(megaTable, if([Encoded] = MIN('Key Metrics Detail'[DimensionEncoded]), [ZScore], blank()))
By using the 'indicators' and multiplying things by blank values it means only the relevant table is being calculated and drops the processing time to 100ms.
This method should work across any multi dimension calculation, not just ZScores.
So I have progressed and have managed to get the above parts working well and responsive. Example code at end of post if anyone is interested.
However, my question around how the Switch function processes would be amazing.
I have two pickers 1) Numerator & 2) Denominator. These allow you to select what measure you want in the calculations for ZScores (or even could be a simple ratio).
The code I am using to find the appropriate measure to use takes the [Numerator Type] directly from the Numerator Picker. I currently have circa 10 measures so far, however I would prefer to scale that up more. As soon as I replace the [No Products] from the refactored original code the speed goes from 400ms to 5000+ms.
Key Metrics Numerator = IF(HASONEVALUE('Key Metrics Numerator'[Numerator Type]), SWITCH(MAX('Key Metrics Numerator'[Numerator Type]), "Contacts", [No Contact], "Products", [No Products], "PDDs", [No PDD], "Waste", [No Wasted], "Refunded", [No Refunded], "Income", [Income] ))
Hence this makes me question how the SWITCH function operates.
If "PDDs" is selected in the picker does the SWITCH statement follow the below steps?
Now I hope the above steps do not occur, I would hope that steps 3 & 5 doesn't need to occur. However my query get so much slower as soon as this switch statement is added. I do reference the [Key Metrics Numerator] a substaintial amount of times throughout my ZScore calculations due to the quantity of Dimensions used.
Is there another approach the works better than using SWITCH, or restructing the Switch statement above better?
Or a better way to reference the Numerator & Denominator fewer times in the below code.
REFACTORED CODE SOLUTION
Note that [No Contact] & [No Products] from the orginal post have been replaced with [Key Metrics Numerator] & [Key Metrics Denominator] to demonstrate the above problem.
This is my approach to building the base tables to calculate from (takes around 200 ms)
ZScore Refactored = VAR OutcomeYearSimpleTable = ADDCOLUMNS(ADDCOLUMNS( SUMMARIZE ('Dates Outcome', 'Dates Outcome'[Financial Year]), "Category", "Outcome Year"), "Encoded", TRIM(CONCATENATE([Category], CONCATENATE(" ", [Financial Year]))), "Numerator" , [Key Metrics Numerator], "Denominator", [Key Metrics Denominator] ) VAR OutcomeYearTable = ADDCOLUMNS(ADDCOLUMNS(OutcomeYearSimpleTable, "NumeratorTotal", CALCULATE([Key Metrics Numerator], ALL('Dates Outcome'[Financial Year])), "DenominatorTotal", CALCULATE([Key Metrics Denominator], ALL('Dates Outcome'[Financial Year])) ), "IndexOfProportion", DIVIDE ( [Numerator], [NumeratorTotal]) - DIVIDE([Denominator], [DenominatorTotal]) ) VAR OutcomeYearTableZ = ADDCOLUMNS( ADDCOLUMNS(OutcomeYearTable, "Mean", AVERAGEX ( OutcomeYearTable, [IndexOfProportion] ), "StdDev", STDEVX.P ( OutcomeYearTable, [IndexOfProportion] ) ), "ZScore", DIVIDE (( [IndexOfProportion] - [Mean] ), [StdDev]) )
Then I find this approach work well to pick which dimension to display (now from whatever column you want)
MINX(megaTable, if([Encoded] = MIN('Key Metrics Detail'[DimensionEncoded]), [ZScore], blank()))
Hi @Anonymous,
Thanks for sharing, more people will benefit from here. And I will post the update if I has better optimized solution.
Best Regards,
Angelia
Thanks @v-huizhn-msft
Trying to think about how to improve the SWITCH statements using these links.
http://joyfulcraftsmen.com/blog/ssas-tabular-another-dax-ifswitch-performance-story/
http://joyfulcraftsmen.com/blog/dax-if-only-i-could-avoid-if/
I have improved it a miniscule amount using Variables for the start of my SWITCH, however still a long way to go before I am satisfied.
One part that I am very confused about is when I run a 'Server Timings' experiment with my measure in DAX Studio with this evaulation I get it to complete in 975 ms. However when I add the same measure in a visual in Power BI with [Dimension Detail] as an Axis it takes circa 2 minutes to calculate. Why would this timing by out by a factor of 120?
EVALUATE SUMMARIZE('Key Metrics Detail', 'Key Metrics Detail'[Dimension Detail], "Example", [ZScore Measure])
Starting to get closer.... however how can I do it better?
Managed to shave 25% of the processing time from the Numerator switch measure
My version using the original SWITCH calculates in 204ms
Key Metrics Numerator = VAR typeValues = VALUES('Key Metrics Numerator'[Numerator Type]) VAR hasOne = HASONEVALUE('Key Metrics Numerator'[Numerator Type]) VAR comparision = typeValues RETURN IF(hasOne, SWITCH(comparision, "Loaded", [No Loaded], "Callable", [No Callable], "Contacts", [No Contact], "Products", [No Products], "PDDs", [No PDD], "Waste", [No Wasted], "Redundant", [No Redundant], "Refunded", [No Refunded], "Income", [Income], "Cost Data", [Cost Data], "Attrition Supplied", [Attrition Supplied], "Attrition Return", [Attrition Return], "Attrition Unpaid", [Attrition Unpaids], "Attrition 3mth", [Attrition 3 Months] ))
My newer approach is around 150ms.... anyway I can drop more of the IFs or other ways to improve this? (Does feel very counter-intuitive running with this approach)
DEFINE MEASURE 'Key Metrics Detail'[Key Metrics Numerator] = VAR typeValues = VALUES('Key Metrics Numerator'[Numerator Type]) VAR hasOne = HASONEVALUE('Key Metrics Numerator'[Numerator Type]) VAR comparision = typeValues VAR loadedIndicator = if(NOT(typeValues = "Loaded"), blank(), [No Loaded]) VAR callableIndicator = if(NOT(typeValues = "Callable"), blank(), [No Callable]) VAR contactIndicator = if(NOT(typeValues = "Contacts"), blank(), [No Contact]) VAR productIndicator = if(NOT(typeValues = "Products"), blank(), [No Products]) VAR pddIndicator = if(NOT(typeValues = "PDDs"), blank(), [No PDD]) VAR wasteIndicator = if(NOT(typeValues = "Waste"), blank(), [No Wasted]) VAR redundantIndicator = if(NOT(typeValues = "Redundant"), blank(), [No Redundant]) VAR refundedIndicator = if(NOT(typeValues = "Refunded"), blank(), [No Refunded]) VAR costDataIndicator = if(NOT(typeValues = "Cost Data"), blank(), [Cost Data]) VAR attritionSuppliedIndicator = if(NOT(typeValues = "Attrition Supplied"), blank(), [Attrition Supplied]) VAR attritionReturnIndicator = if(NOT(typeValues = "Attrition Return"), blank(), [Attrition Return]) VAR attritionUnpaidIndicator = if(NOT(typeValues = "Attrition Unpaid"), blank(), [Attrition Unpaids]) VAR attrition3mthIndicator = if(NOT(typeValues = "Attrition 3mth"), blank(), [Attrition 3 Months]) VAR incomeIndicator = if(NOT(typeValues = "Income"), blank(), [Income]) RETURN loadedIndicator + callableIndicator + contactIndicator + productIndicator + pddIndicator + wasteIndicator + redundantIndicator + refundedIndicator + costDataIndicator + attritionSuppliedIndicator + attritionReturnIndicator + attritionUnpaidIndicator + attrition3mthIndicator + incomeIndicator
If anyone is interested if they come across a similar problem. This is a much more effecient version.
Make each Dimension as variables (This is One Dimension. I have duplicated this as variables for every dimension.
VAR SelectedDimension = [SelectedDimension] VAR OutcomeYearCategory = "Outcome Year" VAR OutcomeYearIndicator = IF(SelectedDimension = OutcomeYearCategory, 1, BLANK()) VAR OutcomeYearSimpleTable = ADDCOLUMNS(ADDCOLUMNS( VALUES('Dates Outcome'[Financial Year]), "Category", OutcomeYearCategory), "Encoded", TRIM(CONCATENATE([Category], CONCATENATE(" ", [Financial Year]))), "Numerator" , [No Products] * OutcomeYearIndicator , "Denominator", [No Contact] * OutcomeYearIndicator ) VAR OutcomeYearTable = ADDCOLUMNS(ADDCOLUMNS(OutcomeYearSimpleTable, "NumeratorTotal", CALCULATE([No Products] * OutcomeYearIndicator, ALL('Dates Outcome'[Financial Year])), "DenominatorTotal", CALCULATE([No Contact] * OutcomeYearIndicator, ALL('Dates Outcome'[Financial Year])) ), "IndexOfProportion", DIVIDE ( [Numerator], [NumeratorTotal]) - DIVIDE([Denominator], [DenominatorTotal]) ) VAR OutcomeYearTableZ = ADDCOLUMNS( ADDCOLUMNS(OutcomeYearTable, "Mean", AVERAGEX ( OutcomeYearTable, [IndexOfProportion] ), "StdDev", STDEVX.P ( OutcomeYearTable, [IndexOfProportion] ) ), "ZScore", DIVIDE (( [IndexOfProportion] - [Mean] ), [StdDev]) )
Then I pull it all together with a megaTable that holds every single dimension.
VAR megaTable = UNION(OutcomeYearTableZ, OutcomeQuarterTableZ) RETURN MINX(megaTable, if([Encoded] = MIN('Key Metrics Detail'[DimensionEncoded]), [ZScore], blank()))
By using the 'indicators' and multiplying things by blank values it means only the relevant table is being calculated and drops the processing time to 100ms.
This method should work across any multi dimension calculation, not just ZScores.
User | Count |
---|---|
140 | |
113 | |
104 | |
77 | |
65 |
User | Count |
---|---|
136 | |
118 | |
101 | |
71 | |
61 |