Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Greg_Deckler

Performance Tuning DAX - Part 2

Introduction

First part is here: https://community.powerbi.com/t5/Community-Blog/Performance-Tuning-DAX-Part-1/ba-p/976275. I suggest you read it first because otherwise this won't make a lot of sense.

 

For those of you who didn't already cheat, download the PBIX and skip ahead...

 

So, continuing where we left off in Part 1, of course I can't leave well enough alone, I really wanted to get this thing down to a sub-second calculation time. Spoiler, no such luck, but here goes:

Step 9

Looking at the code, the SUMX bothers me. SUMX is an iterator function and thus strikes me as potentially slowing things down. Let's get rid of it:

 

Total Orders 10 = 
    VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
    VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
    VAR __Table =
        FILTER(
            'Tracking History',
            AND (
                OR (  
                    'Tracking History'[Start Date] > MinDateInContext, 
                    'Tracking History'[End Date] > MaxDateInContext
                ),
                MaxDateInContext > 'Tracking History'[Start Date]
            )
        )
RETURN
    COUNTROWS(FILTER(__Table,[Start Date] > MinDateInContext || [End Date] > MaxDateInContext))

 

Performance analyzer results were as follows:

  • Total Orders 10, 22,498 milliseconds, 22 seconds

OK, basically no difference, guess the iterator functions aren't so evil after all...

Step 10

Fine, now that everything is a FILTER, let's just consolidate everything:

 

Total Orders 11 = 
    VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
    VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
RETURN
        COUNTROWS(
            FILTER(
                FILTER(
                    'Tracking History',
                    AND (
                        OR (  
                            'Tracking History'[Start Date] > MinDateInContext, 
                            'Tracking History'[End Date] > MaxDateInContext
                        ),
                        MaxDateInContext > 'Tracking History'[Start Date]
                    )
                ),
                OR (
                    [Start Date] > MinDateInContext,
                    [End Date] > MaxDateInContext
                )
            )
        )

 

Performance analyzer results were as follows:

  • Total Orders 11, 21,996 milliseconds, 22 seconds

Nothing. Hmm. Maybe we are running out of optimizations.

Step 11

So our current logic in the filters is as follows:

 

 ( (A || B) && C ) && (A || B)

 

Knowing how to reduce and simplify logic equations, we can rewrite this to:

 

(A && C) || (B & C)

 

So, let's do that:

 

Total Orders 12 = 
    VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
    VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
RETURN
    COUNTROWS(
        FILTER(
            'Tracking History',
            OR (
                AND (
                    'Tracking History'[Start Date] > MinDateInContext,
                    MaxDateInContext > 'Tracking History'[Start Date]
                ),
                AND (
                    'Tracking History'[End Date] > MaxDateInContext,
                    MaxDateInContext > 'Tracking History'[Start Date]
                )
            )
        )
    )

 

Performance analyzer results were as follows:

  • Total Orders 12, 31,413 milliseconds, 31 seconds

Ahhh!!! We broke it!! That last optimization was one too far! I ran this test multiple, multiple times, it always came back in the 30 second range whereas Total Orders 11 always came back in the 20-22 second range. Drat!

Step 12

OK, let's keep the same basic idea but go back to multiple filters so that we are filtering early!

 

Total Orders 13 = 
    VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
    VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
RETURN
        COUNTROWS(
            FILTER(
                FILTER(
                    FILTER (
                        'Tracking History',
                            OR (  
                                'Tracking History'[Start Date] > MinDateInContext, 
                                'Tracking History'[End Date] > MaxDateInContext
                            )
                    ),
                       MaxDateInContext > 'Tracking History'[Start Date]
                ),
                OR (
                    [Start Date] > MinDateInContext,
                    [End Date] > MaxDateInContext
                )
            )
        )

 

Performance analyzer results were as follows:

  • Total Orders 13, 21,215 milliseconds, 21 seconds

OK, back down to 20 seconds again!! Whew!!

Conclusion

So, unlike Part 1 where we took the calculation down from 10 minutes to 20 seconds in Part 2 we have successfully managed to more or less waste our time. 20 seconds to 20 seconds. What have we learned in Part 2?

  • Obviously nothing about better performance.
  • Know when to quit. When multiple optimizations start to result in little or no performance gain, you're probably done
  • It is possible to do things that you consider to be optimizations that actually deoptimize performance.
  • Clean code is not necessarily performance optimized code. Total Orders 12 is cleaner code, but performs worse
  • Again, the pattern of filtering early saved the day with Total Orders 13! Nested IF's bad, nested FILTER good

Hmm, look at that, five learning bullet points for Part 2 versus six for Part 1. For all of you people over the years that have told me that "you learn more from failure than from success". You are all liars. 😁

Comments
Anonymous

This one

 

Total Orders 14 = 
VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
RETURN
COUNTROWS(
    filter(
        FILTER(
            'Tracking History',
            'Tracking History'[Start Date] < MaxDateInContext
        ),
        MinDateInContext < 'Tracking History'[Start Date]
        ||
        MaxDateInContext < 'Tracking History'[End Date]        
    )
)

 

is about 16% faster than Total Orders 13.

 

Best

D

@Anonymous  - I can confirm this. For comparison with article, on my machine this ran in 17,301 milliseconds or 17 seconds. So, you're maybe not giving yourself enough credit, it's close to a 20% improvement I think!

 

Any thoughts as to why moving everything into a single measure had the greatest impact?

Hi Greg,

Very interesting posts, thanks for sharing.

 

Although you state at the beginning that your intention was not to cover model optimizations and the like, I wanted to see how fast I could optimize this - 20 seconds still seemed way too much for the task.

First point, the original logic of the measure could be heavily simplified to the point of being understandable (although not faster than your final solution, it gives a much better starting point for the final optimization).

 

The simplified logis is that #orders = (all orders) - (orders already closed) - (orders not yet open).

So, a simple measure would be:

 

Total Orders Simplified = 
    VAR DMin = MIN ( 'DateTimeTable'[Date] )
    VAR DMax = MAX ( DateTimeTable[Date] )
    var _orders = countrows('Tracking History')
    var _closed_orders = CALCULATE( COUNTROWS('Tracking History') , 'Tracking History'[End Date] < DMin)
    var _future_orders = CALCULATE( COUNTROWS('Tracking History') , 'Tracking History'[Start Date] > DMax)
    return
    _orders - _closed_orders - _future_orders

 

 

The measure slower than your final measure (on my laptop), but simpler. 

With this starting point, we can complete a proper model with relations (which your screenshot seemed to show you had at some point at the beginning):

- Create 2 columns with a rounded (to the nearest hour) time for Start Time and End Time

- Create 2 inactive relationships (but physical ones, i.e. 1:N) from the rounded Start/End time to the DateTime Table

- Create the final measure:

 

 

Total Orders fast = 
var _end = max(DateTimeTable[Date])
var _start = min(DateTimeTable[Date])
return
[#orders]
- CALCULATE(
    CALCULATE(
        [#orders] 
        , USERELATIONSHIP(DateTimeTable[Date] , 'Tracking History'[EndRound])
    )
    , DateTimeTable[Date] <= _start
) 
- CALCULATE(
    CALCULATE(
        [#orders] 
        , USERELATIONSHIP(DateTimeTable[Date] , 'Tracking History'[StartRound])
    )
     ,DateTimeTable[Date] > _end
)

 

 

This results in a massive improvement: From 28 seconds (using the TotalOrders13 measure) down to 2.5 seconds (using the TotalOrdersFast measure) , so a 10x improvement. Plus, it is better cached 🙂

Hope this can help future readers on several points:

- Dont store & calculate on smaller granularities than it is needed

- Use physical relationships whenever you can

- Usage of USERELATIONSHIP (which I'm not covering in detail, but you can see that I've used a double calculate above) 

Hope this helps, and if you can optimize it ever further, let me know!

BR

P.S.: How can I add an attachment here? Would share the file if I could