cancel
Showing results for 
Search instead for 
Did you mean: 

Performance Tuning DAX - Part 1

Introduction

OK, let's get a few things out of the way right off the bat.

 

First and foremost, I do not consider myself a DAX performance optimization expert nor do I play one on TV. Truth be told, the vast majority of solutions that I post to the forums are with utmost certainty not optimized DAX code. Believe it or not, that doesn't bother me...at all. The reality is that DAX optimization is difficult and time consuming and, quite frankly, the vast majority of the time the optimization does not matter and is simply a waste of time. Or, the optimization makes the code less readable. We all remember the bad old days of Perl code. For the vast majority of use cases, if the DAX runs in 600 milliseconds or 300 milliseconds, the average user is not going to be all that upset. It's still sub-second response times. What matters most is a solution. Once you have a solution it can always be optimized if necessary. 

 

So, with that out of the way, on to the next useless pre-emptive clarification. This post is about DAX optimization. Not data model optimization. Not Power Query optimization. Not, pick a topic, optimization.  DAX optimization. If you want to leave comments about other optimizations that are not DAX, write your own blog article on them. That's not the purpose of this post.

 

Third clarification, I make no claims that this is the "most" optimized DAX code. In fact, I would LOVE to see feedback/comments on even more efficient DAX code. I have attached the PBIX file I used to this post.

 

So, what is this post about? Simple. A community member presented a problem where Power BI was taking 30 minutes to render a visualization. Let that sink in, 30 minutes. That's a lot of minutes folks. This post simply documents my efforts to optimize the DAX as much as possible. Now, that 30 minutes was more like 10 minutes on my machine. And I have a rickety old Surface Pro 4 with a mere 16 GB of memory so I feel bad for that guy... Anyway, I was able to get this calculation down to about 20 seconds, which is a reasonable amount of improvement. This post is about presenting to the community some lessons learned in this journey. Your mileage may vary because DAX optimization is a complex, tricky and relatively unexplored area. And, better yet, it is going to vary by circumstance. But, I felt that there were enough lessons learned here to open the discussion up to a wider audience. Maybe, maybe not.

The Original Problem

So the original problem involved three tables within a larger data model that looked like the following:

 

image.png

 

The fact table involved, Tracking History, only had about 60,000 rows in it, which made this problem even more intriguing. DAX performance issues are relatively rare and to have one occur on such a small dataset is somewhat surprising.

 

There were two measures involved:

 

No. of Orders = 
VAR StartDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[Start Date]) )
VAR EndDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[End Date] ) )
VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
RETURN
IF (
        AND (  StartDate > MinDateInContext, EndDate < MaxDateInContext ),
        1,
        IF (
            and(AND (StartDate > MinDateInContext, EndDate>MaxDateInContext),MaxDateInContext>StartDate),
            1,
            IF (
               and( AND ( StartDate < MinDateInContext, EndDate < MaxDateInContext ),EndDate>MinDateInContext),
               1,
                IF (
                    AND ( StartDate < MinDateInContext, EndDate > MaxDateInContext ),
                    1,
                    BLANK ()
                )
            )
        )
    )

Total Orders = SUMX('Tracking History',[No. of Orders])

 

This creates the following visualization:

image.png

In Performance Analyzer, this visual took 595,908 milliseconds to render, or about 9 minutes, 56 seconds.

Step 1

OK, step one, let's clean up the code. Nested IF's. Ack! Let's change those to a SWITCH statement.

 

No. of Orders 2a = 
    VAR StartDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[Start Date]) )
    VAR EndDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[End Date] ) )
    VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
    VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
RETURN
	SWITCH(TRUE(),
		AND (  StartDate > MinDateInContext, EndDate < MaxDateInContext ),1,
		AND( AND (StartDate > MinDateInContext, EndDate > MaxDateInContext),MaxDateInContext > StartDate),1,
		AND( AND ( StartDate < MinDateInContext, EndDate < MaxDateInContext ),EndDate > MinDateInContext),1,
        AND ( StartDate < MinDateInContext, EndDate > MaxDateInContext ),1,
		BLANK()
    )

 

I also tried this with using && instead of AND like this:

 

No. of Orders 2 = 
    VAR StartDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[Start Date]) )
    VAR EndDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[End Date] ) )
    VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
    VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
RETURN
	SWITCH(TRUE(),
		StartDate > MinDateInContext && EndDate < MaxDateInContext,1,
		(MaxDateInContext > StartDate && StartDate > MinDateInContext) && EndDate > MaxDateInContext,1,
        (StartDate < MinDateInContext && EndDate < MaxDateInContext) && EndDate > MinDateInContext,1,
		StartDate < MinDateInContext && EndDate > MaxDateInContext,1,
		BLANK()
	)

 

Performance analyzer results were as follows:

  • No. of Orders 2a, Total Orders 2a, 606,392 milliseconds or 10 minutes, 6 seconds
  • No. of Orders 2, Total Orders 2, 613,939 milliseconds or 10 minutes, 14 seconds

So, first couple of lessons, no real impact to performance using SWITCH instead of nested IF statements. Also, no impact in using AND functions versus inline &&. While these times are slightly longer, they are not statistically significant as the same query will run slightly longer or shorter depending on a range of factors. But, the code is definitely more readable!

Step 2

Now that we can better see the logic involved, it is evident that the logic is filtering out certain rows and assigning a value of 1 while the remainder that do not fit the criteria are given a value of BLANK. However, in the visual, no more than 2,000 is displayed as a sum of all of the 1's and there are 60,000 rows in the fact table. This means that the vast majority of our rows have to be tested for a bunch of conditions before they "fall through" to be assigned BLANK. Thus, it follows that if we reverse the logic we will eliminate rows faster and thus there will be less testing involved and hence less processing and calculation.

 

No. of Orders 3 = 
    VAR StartDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[Start Date]) )
    VAR EndDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[End Date] ) )
    VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
    VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
RETURN
	SWITCH(TRUE(),
		AND (  StartDate > MinDateInContext, EndDate > MaxDateInContext ),BLANK(),
		AND( AND (StartDate > MinDateInContext, EndDate > MaxDateInContext),MaxDateInContext < StartDate),BLANK(),
		AND( AND ( StartDate < MinDateInContext, EndDate < MaxDateInContext ),EndDate < MinDateInContext),BLANK(),
        AND ( StartDate < MinDateInContext, EndDate < MaxDateInContext ),BLANK(),
		1
    )

 

Performance analyzer results were as follows:

  • No. of Orders 3, Total Orders 3, 285,969 milliseconds or 4 minutes, 46 seconds

Aha! We've knocked the calculation time down by half! So, lesson learned, when performing conditional logic tests, structure your code such that you eliminate the most amount of rows as early as possible.

Step 3

Thus far I have been creating two measures for each step, creating a new Total Orders measure along with my new No. of Orders measure. This is a pain and I'm lazy so let's combine this into a single measure:

 

Total Orders 4 = 
    VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
    VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
    VAR __Table =
        ADDCOLUMNS(
            'Tracking History',
            "__No of Orders",
            SWITCH(TRUE(),
                AND (  
                    'Tracking History'[Start Date] > MinDateInContext, 
                    'Tracking History'[End Date] > MaxDateInContext
                ),BLANK(),
                AND( 
                    AND (
                        'Tracking History'[Start Date] > MinDateInContext, 
                        'Tracking History'[End Date] > MaxDateInContext
                    ),
                    MaxDateInContext < 'Tracking History'[Start Date]
                ),BLANK(),
                AND( 
                    AND ( 
                        'Tracking History'[Start Date] < MinDateInContext, 
                        'Tracking History'[End Date] < MaxDateInContext 
                    ),
                    'Tracking History'[End Date] < MinDateInContext
                ),BLANK(),
                AND ( 
                    'Tracking History'[Start Date] < MinDateInContext, 
                    'Tracking History'[End Date] < MaxDateInContext 
                ),BLANK(),
                1
            )
        )
RETURN
    SUMX(__Table,[__No of Orders])

 

Performance analyzer results were as follows:

  • Total Orders 4, 57,010 milliseconds, 57 seconds

Wow! Perhaps unexpectedly this really improved performance!! The lesson learned here? Being lazy is a good thing? Difficult to say but likely has something to do with internal DAX optimization. Having all of the code in a single measure helps DAX optimize the query, for example.

Step 4

Now that we have reversed the logic and have this all in one measure, we can clearly see that we could move the first logic test into a FILTER of the table. So pre-filter the table so we do not have to do the first logic test. Let's see what that does:

Total Orders 5 = 
    VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
    VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
    VAR __Table =
        ADDCOLUMNS(
            FILTER(
                'Tracking History',
                OR (  
                    'Tracking History'[Start Date] > MinDateInContext, 
                    'Tracking History'[End Date] > MaxDateInContext
                )
            ),
            "__No of Orders",
            SWITCH(TRUE(),
                AND( 
                    AND (
                        'Tracking History'[Start Date] > MinDateInContext, 
                        'Tracking History'[End Date] > MaxDateInContext
                    ),
                    MaxDateInContext < 'Tracking History'[Start Date]
                ),BLANK(),
                AND( 
                    AND ( 
                        'Tracking History'[Start Date] < MinDateInContext, 
                        'Tracking History'[End Date] < MaxDateInContext 
                    ),
                    'Tracking History'[End Date] < MinDateInContext
                ),BLANK(),
                AND ( 
                    'Tracking History'[Start Date] < MinDateInContext, 
                    'Tracking History'[End Date] < MaxDateInContext 
                ),BLANK(),
                1
            )
        )
RETURN
    SUMX(__Table,[__No of Orders])

Performance analyzer results were as follows:

  • Total Orders 5, 44,158 milliseconds, 44 seconds

OK, not as dramatic of an improvement, but still knocked about 25% off of the calculation time. Lesson learned, filter early!

Step 5

Looking closely at our logic, the last two tests are redundant. We can get rid of the redundant test.

Total Orders 6 = 
    VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
    VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
    VAR __Table =
        ADDCOLUMNS(
            FILTER(
                'Tracking History',
                OR (  
                    'Tracking History'[Start Date] > MinDateInContext, 
                    'Tracking History'[End Date] > MaxDateInContext
                )
            ),
            "__No of Orders",
            SWITCH(TRUE(),
                AND( 
                    AND (
                        'Tracking History'[Start Date] > MinDateInContext, 
                        'Tracking History'[End Date] > MaxDateInContext
                    ),
                    MaxDateInContext < 'Tracking History'[Start Date]
                ),BLANK(),
                AND ( 
                    'Tracking History'[Start Date] < MinDateInContext, 
                    'Tracking History'[End Date] < MaxDateInContext 
                ),BLANK(),
                1
            )
        )
RETURN
    SUMX(__Table,[__No of Orders])

Performance analyzer results were as follows:

  • Total Orders 6, 43,844 milliseconds, 44 seconds

Nope, no real improvement (likely because DAX already optimized out this redundancy). But, the code is shorter and cleaner so that's a win!

Step 6

Those VALUE statements seem unnecessary, let's get rid of those.

Total Orders 7 = 
    VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
    VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
    VAR __Table =
        ADDCOLUMNS(
            FILTER(
                'Tracking History',
                OR (  
                    'Tracking History'[Start Date] > MinDateInContext, 
                    'Tracking History'[End Date] > MaxDateInContext
                )
            ),
            "__No of Orders",
            SWITCH(TRUE(),
                AND( 
                    AND (
                        'Tracking History'[Start Date] > MinDateInContext, 
                        'Tracking History'[End Date] > MaxDateInContext
                    ),
                    MaxDateInContext < 'Tracking History'[Start Date]
                ),BLANK(),
                AND ( 
                    'Tracking History'[Start Date] < MinDateInContext, 
                    'Tracking History'[End Date] < MaxDateInContext 
                ),BLANK(),
                1
            )
        )
RETURN
    SUMX(__Table,[__No of Orders])

Performance analyzer results were as follows:

  • Total Orders 7, 32,109 milliseconds, 32 seconds

Wow, another 25% reduction. Lesson learned, stop using VALUE and VALUES unnecessarily! I see a lot of it in the forums. It is costing you performance! Only use those functions if you really need to.

Step 7

In looking at the logic once again and considering our pre-filtering, that first logic test is overly complicated. Let's simplify it.

Total Orders 8 = 
    VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
    VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
    VAR __Table =
        ADDCOLUMNS(
            FILTER(
                'Tracking History',
                OR (  
                    'Tracking History'[Start Date] > MinDateInContext, 
                    'Tracking History'[End Date] > MaxDateInContext
                )
            ),
            "__No of Orders",
            SWITCH(TRUE(),
                MaxDateInContext < 'Tracking History'[Start Date],
                BLANK(),
                AND ( 
                    'Tracking History'[Start Date] < MinDateInContext, 
                    'Tracking History'[End Date] < MaxDateInContext 
                ),BLANK(),
                1
            )
        )
RETURN
    SUMX(__Table,[__No of Orders])

Performance analyzer results were as follows:

  • Total Orders 8, 24,208 milliseconds, 24 seconds

Hey! Another 33% improvement. Lesson learned, simpler logic is better!

Step 8

It worked before, what if we move the logic test to the filter clause?

Total Orders 9 = 
    VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
    VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
    VAR __Table =
        ADDCOLUMNS(
            FILTER(
                'Tracking History',
                AND (
                    OR (  
                        'Tracking History'[Start Date] > MinDateInContext, 
                        'Tracking History'[End Date] > MaxDateInContext
                    ),
                    MaxDateInContext > 'Tracking History'[Start Date]
                )
            ),
            "__No of Orders",
            SWITCH(TRUE(),
                AND ( 
                    'Tracking History'[Start Date] < MinDateInContext, 
                    'Tracking History'[End Date] < MaxDateInContext 
                ),BLANK(),
                1
            )
        )
RETURN
    SUMX(__Table,[__No of Orders])

Performance analyzer results were as follows:

  • Total Orders 9, 22,467 milliseconds, 22 seconds

OK, maybe a minor peformance improvement, didn't seem to hurt anything and our fastest time yet!

Conclusion

Performance tuning DAX can have dramatic results. In this case, code that runs 30x faster than the original. To achieve these kinds of improvements, pay attention to the following:

  • Clean up your code, it may not improve performance directly but will help you better understand what you are doing and more easily find optimizations
  • Consider structuring your conditional logic so that you eliminate the most rows as early as possible. Sometimes this means reversing your logic.
  • Put all of your code in one measure. This one I am not 100% on but in this case consolidating the code had the greatest impact perhaps due to internal DAX optimization. It is also possible that getting rid of the calculation of the two variables or some other optimization in using ADDCOLUMNS helped out here.
  • Filter your data as much as possible before you start performing logic tests
  • Simplify your logic
  • Stop using VALUE and VALUES if you do not have to use them

Stay tuned for Part 2 where I will cover more optimizations and discuss the dangers of over-optimization!

Comments

Hey Greg,

thanks a lot for this detailed analysis and step-by-step optinmization of the code!

I'll definitely download the file and try to do some optimization on my own 🙂

Best regards