Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Optimize Z Scores across multiple dimensions

I am trying to Optimize a query that I have that produces the correct results, however very slow.

 

What I want to do is to click on a visual for the 'Dimension' and then another visual shows the ZScores for the Values within that Dimension. (I have removed the part of my formula that makes this show no ZScores if not a single Dimension is selected just for demo purposes)

 

I have a table that contains multiple dimensions (circa 20+) that I want to get the ZScores for each one of those dimensions. My base table looks like this

Products                             Contacts                     STAFF       REGION    AFFLUENCE   LIFESTAGE

      5                                     10                            Staff 1       Region1    Affluence1     Lifestage1

      10                                   15                            Staff 2       Region1    Affluence1     Lifestage2

      6                                     50                            Staff 2       Region2    Affluence2     Lifestage2

 

 

 

VAR AffluenceSimpleTable =
	ADDCOLUMNS(
    SUMMARIZE (biRecord,
        biRecord[Demographic Affluence]), 
        "Encoded", TRIM(CONCATENATE("Affluence", CONCATENATE(" ", [Demographic Affluence]))), 
        "Numerator" , [No Products],
		"Denominator", [No Contact]
        )	
VAR AffluenceTable = 
	ADDCOLUMNS(ADDCOLUMNS(AffluenceSimpleTable,
		"NumeratorTotal", CALCULATE([No Products], ALL(biRecord[Demographic Affluence])),
	    "DenominatorTotal", CALCULATE([No Contact], ALL(biRecord[Demographic Affluence]))
	),
		"IndexOfProportion", DIVIDE ( [Numerator], [NumeratorTotal]) - DIVIDE([Denominator], [DenominatorTotal])
	)	
VAR AffluenceTableZ = 
ADDCOLUMNS(
	ADDCOLUMNS(AffluenceTable,
		"Mean", AVERAGEX ( AffluenceTable, [IndexOfProportion] ),
		"StdDev", STDEVX.P ( AffluenceTable, [IndexOfProportion] )
	
	),
		"ZScore", DIVIDE (( [IndexOfProportion] - [Mean] ), [StdDev]) 
	)

 

 

So each one of these tables calculate in approx 200ms (happy with that - however if someone sees a way to optimize the above further that would be useful) - I have this block of code for every single Dimension. The 'Encoded' value is what I am using to match this Dimension Detail in the below table.

 

I have another table that collates all the Dimension Details together that looks like this called 'DimensionTbl'...

DIMENSION        DIMENSION DETAIL     DIMENSIONENCODED

Staff                      Staff 1                         Staff Staff 1

Staff                      Staff 2                         Staff Staff 2

Region                  Region 1                     Region Region 1

Region                  Region 2                     Region Region 2

Lifestage               Lifestage 1                  Lifestage Lifestage 1

Lifestage               Lifestage 2                  Lifestage Lifestage 2

Affluence              Affluence 1                  Affluence Affluence 1

Affluence              Affluence 2                  Affluence Affluence 2

 

So far I can think of two approaches to get the relevant results against the 'Dimension Detail' table.

 

First version is this, which is to merge all of these dimension calculated variable tables together and then find the appropriate value within that - This means the more dimensions I add the slower it becomes, especially if a dimension has many values.

 

 VAR megaTable = UNION(
    AffluenceTableZ,
    LifestageTableZ,
    RegionTableZ,
StaffTableZ
)

RETURN SUMX('DimensionTbl', SUMX(megaTable, if([Encoded] = 'DimensionTbl'[DimensionEncoded], [ZScore], BLANK()))

 

 

The second version uses Switch statements to achieve the same thing....

RETURN

SWITCH(MAX('DimensionTbl'[Dimension]),
"Lifestage", MINX(LifestageTableZ, [ZScore]),
"Affluence", MINX(AffluenceTableZ, [ZScore]),
"Staff", MINX(StaffTableZ, [ZScore]),
"Region", MINX(RegionTableZ, [ZScore]),
)

Both of these approaches work to produce an accurate result, however both methods are slow.

 

A few questions...

1) Is there another approach that is more efficient

2) How do Switch statements process the calculation? If my Switch matches 'Staff' would that mean that all 3 tables (LifestageTableZ, AffluenceTableZ and StaffTableZ) are all calculated? Or does it just check if the value matches "Lifestage", "Affluence", "Staff", and then StaffTableZ? I have worked out that I should put the results I use most frequently in the Switch should be closer to the start of the Switch statement as it means if it finds a match it ignores all options below that in the Switch - (Hopefully that makes sense)

 

Thanks in advance, and let me know if you need anything clarified.

 

1 ACCEPTED SOLUTION
Anonymous
Not applicable

If anyone is interested if they come across a similar problem. This is a much more effecient version.

 

Make each Dimension as variables (This is One Dimension. I have duplicated this as variables for every dimension.

VAR SelectedDimension = [SelectedDimension]

VAR OutcomeYearCategory = "Outcome Year"
VAR OutcomeYearIndicator = IF(SelectedDimension = OutcomeYearCategory, 1, BLANK())
VAR OutcomeYearSimpleTable =
	ADDCOLUMNS(ADDCOLUMNS(
    VALUES('Dates Outcome'[Financial Year]), 
        "Category", OutcomeYearCategory),
        "Encoded", TRIM(CONCATENATE([Category], CONCATENATE(" ", [Financial Year]))), 
        "Numerator" ,  [No Products] * OutcomeYearIndicator ,
		"Denominator", [No Contact] * OutcomeYearIndicator
        )	
 
VAR OutcomeYearTable = 
	ADDCOLUMNS(ADDCOLUMNS(OutcomeYearSimpleTable,
		"NumeratorTotal", CALCULATE([No Products] * OutcomeYearIndicator, ALL('Dates Outcome'[Financial Year])),
	    "DenominatorTotal", CALCULATE([No Contact] * OutcomeYearIndicator, ALL('Dates Outcome'[Financial Year]))
	),
		"IndexOfProportion", DIVIDE ( [Numerator], [NumeratorTotal]) - DIVIDE([Denominator], [DenominatorTotal])
	)	
VAR OutcomeYearTableZ = 
ADDCOLUMNS(
	ADDCOLUMNS(OutcomeYearTable,
		"Mean", AVERAGEX ( OutcomeYearTable, [IndexOfProportion] ),
		"StdDev", STDEVX.P ( OutcomeYearTable, [IndexOfProportion] )
	
	),
		"ZScore", DIVIDE (( [IndexOfProportion] - [Mean] ), [StdDev]) 
	)

Then I  pull it all together with a megaTable that holds every single dimension.

 

VAR megaTable = UNION(OutcomeYearTableZ, OutcomeQuarterTableZ)


RETURN 

MINX(megaTable, if([Encoded] = MIN('Key Metrics Detail'[DimensionEncoded]), [ZScore], blank()))

By using the 'indicators' and multiplying things by blank values it means only the relevant table is being calculated and drops the processing time to 100ms.

 

This method should work across any multi dimension calculation, not just ZScores.

View solution in original post

5 REPLIES 5
Anonymous
Not applicable

So I have progressed and have managed to get the above parts working well and responsive. Example code at end of post if anyone is interested.

 

However, my question around how the Switch function processes would be amazing. 

I have two pickers 1) Numerator & 2) Denominator. These allow you to select what measure you want in the calculations for ZScores (or even could be a simple ratio). 

 

The code I am using to find the appropriate measure to use takes the [Numerator Type] directly from the Numerator Picker. I currently have circa 10 measures so far, however I would prefer to scale that up more. As soon as I replace the [No Products] from the refactored original code the speed goes from 400ms to 5000+ms.

 

Key Metrics Numerator = IF(HASONEVALUE('Key Metrics Numerator'[Numerator Type]),
 SWITCH(MAX('Key Metrics Numerator'[Numerator Type]), 

"Contacts", [No Contact],
"Products", [No Products],
"PDDs", [No PDD],
"Waste", [No Wasted],
"Refunded", [No Refunded],
"Income", [Income]

)) 

Hence this makes me question how the SWITCH function operates. 

If "PDDs" is selected in the picker does the SWITCH statement follow the below steps?

  1. Calculate the Switching value (MATCHER)
  2. Check if the MATCHER matches "Contacts"
  3. Calculate the value of [No Contacts]
  4. Check if the MATCHER matches "Products"
  5. Calculate the value of [No Products]
  6. Check if the MATCHER matches "PDDs"
  7. Calculate & Use the value of [No Products]

Now I hope the above steps do not occur, I would hope that steps 3 & 5 doesn't need to occur. However my query get so much slower as soon as this switch statement is added. I do reference the [Key Metrics Numerator] a substaintial amount of times throughout my ZScore calculations due to the quantity of Dimensions used.

 

Is there another approach the works better than using SWITCH, or restructing the Switch statement above better?

Or a better way to reference the Numerator & Denominator fewer times in the below code.

 

REFACTORED CODE SOLUTION

Note that [No Contact] & [No Products] from the orginal post have been replaced with [Key Metrics Numerator] & [Key Metrics Denominator] to demonstrate the above problem.

 

This is my approach to building the base tables to calculate from (takes around 200 ms)

 

ZScore Refactored = 
VAR OutcomeYearSimpleTable =
	ADDCOLUMNS(ADDCOLUMNS(
    SUMMARIZE ('Dates Outcome',
        'Dates Outcome'[Financial Year]), 
        "Category", "Outcome Year"),
        "Encoded", TRIM(CONCATENATE([Category], CONCATENATE(" ", [Financial Year]))), 
        "Numerator" , [Key Metrics Numerator],
		"Denominator", [Key Metrics Denominator]
        )	
VAR OutcomeYearTable = 
	ADDCOLUMNS(ADDCOLUMNS(OutcomeYearSimpleTable,
		"NumeratorTotal", CALCULATE([Key Metrics Numerator], ALL('Dates Outcome'[Financial Year])),
	    "DenominatorTotal", CALCULATE([Key Metrics Denominator], ALL('Dates Outcome'[Financial Year]))
	),
		"IndexOfProportion", DIVIDE ( [Numerator], [NumeratorTotal]) - DIVIDE([Denominator], [DenominatorTotal])
	)	
VAR OutcomeYearTableZ = 
ADDCOLUMNS(
	ADDCOLUMNS(OutcomeYearTable,
		"Mean", AVERAGEX ( OutcomeYearTable, [IndexOfProportion] ),
		"StdDev", STDEVX.P ( OutcomeYearTable, [IndexOfProportion] )
	
	),
		"ZScore", DIVIDE (( [IndexOfProportion] - [Mean] ), [StdDev]) 
	)

Then I find this approach work well to pick which dimension to display (now from whatever column you want)

MINX(megaTable, if([Encoded] = MIN('Key Metrics Detail'[DimensionEncoded]), [ZScore], blank()))

 

Hi @Anonymous,

Thanks for sharing, more people will benefit from here. And I will post the update if I has better optimized solution.

Best Regards,
Angelia

Anonymous
Not applicable

Thanks @v-huizhn-msft

 

Trying to think about how to improve the SWITCH statements using these links.

http://joyfulcraftsmen.com/blog/ssas-tabular-another-dax-ifswitch-performance-story/

http://joyfulcraftsmen.com/blog/dax-if-only-i-could-avoid-if/

 

I have improved it a miniscule amount using Variables for the start of my SWITCH, however still a long way to go before I am satisfied.

 

One part that I am very confused about is when I run a 'Server Timings' experiment with my measure in DAX Studio with this evaulation I get it to complete in 975 ms. However when I add the same measure in a visual in Power BI with [Dimension Detail] as an Axis it takes circa 2 minutes to calculate. Why would this timing by out by a factor of 120?

 

EVALUATE
SUMMARIZE('Key Metrics Detail', 'Key Metrics Detail'[Dimension Detail], "Example", [ZScore Measure])

 

Anonymous
Not applicable

Starting to get closer.... however how can I do it better?

 

Managed to shave 25% of the processing time from the Numerator switch measure

 

My version using the original SWITCH calculates in 204ms

Key Metrics Numerator = VAR typeValues = VALUES('Key Metrics Numerator'[Numerator Type])
VAR hasOne = HASONEVALUE('Key Metrics Numerator'[Numerator Type])
VAR comparision = typeValues

RETURN
IF(hasOne,
 SWITCH(comparision, 
"Loaded", [No Loaded], 
"Callable", [No Callable],
"Contacts", [No Contact],
"Products", [No Products],
"PDDs", [No PDD],
"Waste", [No Wasted],
"Redundant", [No Redundant],
"Refunded", [No Refunded],
"Income", [Income],
"Cost Data", [Cost Data],
"Attrition Supplied", [Attrition Supplied],
"Attrition Return", [Attrition Return],
"Attrition Unpaid", [Attrition Unpaids],
"Attrition 3mth", [Attrition 3 Months]

)) 

 

My newer approach is around 150ms.... anyway I can drop more of the IFs or other ways to improve this? (Does feel very counter-intuitive running with this approach)

DEFINE 
MEASURE 'Key Metrics Detail'[Key Metrics Numerator] = 

VAR typeValues = VALUES('Key Metrics Numerator'[Numerator Type])
VAR hasOne = HASONEVALUE('Key Metrics Numerator'[Numerator Type])
VAR comparision = typeValues

VAR loadedIndicator = if(NOT(typeValues = "Loaded"), blank(), [No Loaded])
VAR callableIndicator = if(NOT(typeValues = "Callable"), blank(), [No Callable])
VAR contactIndicator = if(NOT(typeValues = "Contacts"), blank(), [No Contact])
VAR productIndicator = if(NOT(typeValues = "Products"), blank(), [No Products])
VAR pddIndicator = if(NOT(typeValues = "PDDs"), blank(), [No PDD])
VAR wasteIndicator = if(NOT(typeValues = "Waste"), blank(), [No Wasted])
VAR redundantIndicator = if(NOT(typeValues = "Redundant"), blank(), [No Redundant])
VAR refundedIndicator = if(NOT(typeValues = "Refunded"), blank(), [No Refunded])
VAR costDataIndicator = if(NOT(typeValues = "Cost Data"), blank(), [Cost Data])
VAR attritionSuppliedIndicator = if(NOT(typeValues = "Attrition Supplied"), blank(), [Attrition Supplied])
VAR attritionReturnIndicator = if(NOT(typeValues = "Attrition Return"), blank(), [Attrition Return])
VAR attritionUnpaidIndicator = if(NOT(typeValues = "Attrition Unpaid"), blank(), [Attrition Unpaids])
VAR attrition3mthIndicator = if(NOT(typeValues = "Attrition 3mth"), blank(), [Attrition 3 Months])
VAR incomeIndicator = if(NOT(typeValues = "Income"), blank(), [Income])

RETURN 
loadedIndicator + callableIndicator + contactIndicator + productIndicator + pddIndicator + wasteIndicator + redundantIndicator + refundedIndicator + costDataIndicator
	+ attritionSuppliedIndicator + attritionReturnIndicator + attritionUnpaidIndicator + attrition3mthIndicator + incomeIndicator


Anonymous
Not applicable

If anyone is interested if they come across a similar problem. This is a much more effecient version.

 

Make each Dimension as variables (This is One Dimension. I have duplicated this as variables for every dimension.

VAR SelectedDimension = [SelectedDimension]

VAR OutcomeYearCategory = "Outcome Year"
VAR OutcomeYearIndicator = IF(SelectedDimension = OutcomeYearCategory, 1, BLANK())
VAR OutcomeYearSimpleTable =
	ADDCOLUMNS(ADDCOLUMNS(
    VALUES('Dates Outcome'[Financial Year]), 
        "Category", OutcomeYearCategory),
        "Encoded", TRIM(CONCATENATE([Category], CONCATENATE(" ", [Financial Year]))), 
        "Numerator" ,  [No Products] * OutcomeYearIndicator ,
		"Denominator", [No Contact] * OutcomeYearIndicator
        )	
 
VAR OutcomeYearTable = 
	ADDCOLUMNS(ADDCOLUMNS(OutcomeYearSimpleTable,
		"NumeratorTotal", CALCULATE([No Products] * OutcomeYearIndicator, ALL('Dates Outcome'[Financial Year])),
	    "DenominatorTotal", CALCULATE([No Contact] * OutcomeYearIndicator, ALL('Dates Outcome'[Financial Year]))
	),
		"IndexOfProportion", DIVIDE ( [Numerator], [NumeratorTotal]) - DIVIDE([Denominator], [DenominatorTotal])
	)	
VAR OutcomeYearTableZ = 
ADDCOLUMNS(
	ADDCOLUMNS(OutcomeYearTable,
		"Mean", AVERAGEX ( OutcomeYearTable, [IndexOfProportion] ),
		"StdDev", STDEVX.P ( OutcomeYearTable, [IndexOfProportion] )
	
	),
		"ZScore", DIVIDE (( [IndexOfProportion] - [Mean] ), [StdDev]) 
	)

Then I  pull it all together with a megaTable that holds every single dimension.

 

VAR megaTable = UNION(OutcomeYearTableZ, OutcomeQuarterTableZ)


RETURN 

MINX(megaTable, if([Encoded] = MIN('Key Metrics Detail'[DimensionEncoded]), [ZScore], blank()))

By using the 'indicators' and multiplying things by blank values it means only the relevant table is being calculated and drops the processing time to 100ms.

 

This method should work across any multi dimension calculation, not just ZScores.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.