cancel
Showing results for 
Search instead for 
Did you mean: 
Greg_Deckler

CALCUHATE - Why I Don't Use DAX's CALCULATE Function

Introduction

OK, first off, I don't hate, hate the CALCULATE function. At least not to the same level that I despise the time "intelligence" functions. I mean, I realize that I have made reference to DAX's time "intelligence" functions being one of the four horsemen of the apocalypse but just so we are 100% clear, I truly despise them. But that's for another blog article.

 

So, I do not hate DAX's CALCULATE to anywhere near that level but it made for a catchy title for a blog post. In reality, I honestly just don't find myself thinking about CALCULATE most of the time any longer. Over the 5 or 6 years that I have been coding DAX I have simply developed a coding style and technique where CALCULATE is just generally useless. So let me explain how I got there and then I would love to hear everyone else's thoughts on this subject because I think it is an interesting topic with a lot of far reaching implications. In reality, the use of CALCULATE or not CALCULATE really represents two wildly different programming approaches and philosophies around DAX coding.

 

Variables

This may seem unrelated, but my path towards my current disuse of CALCULATE really all started with my use of variables. And I will forever be indebted to @konstantinos for introducing me to the concept of variables in DAX. Truly revolutionized my thinking about what was possible with DAX. In my opinion, the use or non-use of DAX variables is a key, fundamental difference in programming approaches and philosophies.

 

Prior to variables, DAX coding was pretty unintuitive and; quite frankly, ugly. You would end up with these long, ugly, nested to the n'th degree DAX calculations that were difficult to create and even worse to troubleshoot. Or, you would end up with 63,000 measures that was a absolute debugging trainwreck. Or both.

 

In the first case, infinitely nested DAX is awful to deal with. You essentially have to create the code from the inside out and it's just an unnatural experience overall. To debug it, you also have to read it inside out. Also unnatural. To troubleshoot it once it is created is sheer frustration trying to comment out bits and pieces of code above and below the part you are interested in. Not impossible but super annoying.

 

In the second case, using measures as a substitute for variables devolves into a maintenance nightmare. One it muddles up the data model. There are lots of instances where you never actually have to reuse a measure somewhere else so having it hanging out in the data model is a waste. Two, it means the DAX code is spread out all over creation. So if you are trying to debug the code it is not all in once place, you have to start at one DAX calculation and then trace everything back to nested upon nested upon nested DAX measures. I've done it, it's truly an awful experience.

 

Variables tend to solve these issues. Using variables allows you to segment complex calculations into discreet, easily testable chunks of code that are self contained and fit a more natural, top down coding style. When using variables, if you need to test if a particular piece of code is being calculated correctly, you simply edit your RETURN statement to return the variable that is calculating that piece of code. Easy peasy. Furthermore, you can solve a problem in a non-linear and flexible way. With nested DAX, you need to follow an extremely rigid and linear path from A to B. And that's just not very enjoyable or, in my opinion, efficient. Third, when I go back to troubleshoot the code after a few months, all the code is in one spot, I do not need to go hunting and searching for dozens of different, separate measures to track down bits and pieces of code. Finally, variables prevent duplication of code such as in this classic:

 

 

Measure = 
  VAR __Sum = SUM('Table'[Column])
RETURN
  IF(ISBLANK(__Sum),0,__Sum)

 

 

Now think if the equivalent of __Sum is some long, nested DAX statement and you have to repeat it twice in your DAX forumla...

 

OK, so that's why I really like to use variables and I use them a lot. In fact, I will tend to use them in even simple calculations that could be done in a single DAX statement just because its generally likely that DAX calculations become more complex over time so might as well start with a good base model that can be extended later.

 

What Do Variables Have to Do with CALCULATE?

So I know what you are thinking. What in the world does the use of variables have to do with using or not using CALCULATE? Well, the answer is quite simple really. Once you start to use variables you tend to graduate to using table variables. And once you are using tables variables, CALCULATE pretty much becomes useless. Let me demonstrate with a quick example. The following measure does not work:

 

 

Measure = 
    VAR __Table = { ("One",1), ("Two",2), ("Three",3) }
RETURN
    CALCULATE(SUM([Value2]),__Table)

 

 

CALCULATE can't comprehend the Value2 column. Sure, even though calculate's second argument accepts a "table filter expression", this is NOT the same as accepting any expression that returns a table. For calculate to work, everything still needs to be grounded in actual tables within the data model. Conversely, this works just fine:

 

 

Measure = 
    VAR __Table = { ("One",1), ("Two",2), ("Three",3) }
RETURN
    SUMX(__Table,[Value2])

 

 

Unlike CALCULATE, the "X" aggregation functions can operate on virtual tables with ease as well as on actual tables in the data model. 

 

It is because of this fact that I really started to eschew CALCULATE for other methods. Call me lazy or maybe I was dropped on my head as a baby or perhaps MAMA DIDN'T LOVE ME. Who can say, but if I have two ways of doing something and one works all the time and the other only works some of the time, I tend to go with the one that works all the time unless there is an incredibly pressing reason not to. For coding, I see it as a code maintainability thing. If there are multiple ways of solving a recurring coding construct, I want everyone on the team using the same method if at all possible. Having six or seven different ways of doing the same thing is just heaping technical debt upon technical debt for no good reason. In point of fact, I cannot think of a single thing that can be done with CALCULATE that can't be done via some other method, which kind of makes it superfluous.

 

Let's face it, Measure Totals the Final Word, you can't solve that problem using CALCULATE. In fact, it's right around that 2017/2018 time frame that I started to understand that CALCULATE was more of a shackle than something helpful.

 

Reality

Alright, so hopefully by now you are starting to get the picture about why I tend not to use CALCULATE but I'm not quite done. You see, it is the fact that CALCULATE is so tied to the actual tables in the data model that is perhaps its greatest downfall. In fact, most non-trivial DAX calculations are not possible by using CALCULATE unless you happen to have a 100% idealized data model. You can see this in Microsoft's own DAX examples as well as the examples on daxpatterns.com, etc. In almost every case where CALCULATE is used, it is dealing with the AdventureWorks database, the idealized of all idealized data models. Here CALCULATE can shine because the calculations are really pretty trivial thanks to a, let's all be honest, too perfect data model.

 

Reality check, the vast majority of data models that are used in Power BI are far, far from being optimized or in any way ideal. The reality is that 80-90% of the people using Power BI are not even professional IT people, let alone professional data modelers. It's just not realistic. And, as you get further and further away from an idealized data model, the less and less useful CALCULATE tends to be.

 

Sure, in theory, CALCULATE is a super useful function. However, once you inject a bit of reality, it is far, far less useful. I can't help it, I was educated as an engineer. Theory is great and all, but if you can't apply it who really cares?

 

Performance

OK, this article would not be complete if I did not address perhaps one of the biggest reasons why people tout CALCULATE, performance. I won't argue with any of the information about how and why CALCULATE is super performant. Sure, again, that's all great in theory. In reality, just exactly how often does DAX performance come up anyway? I would say not very often at all. Probably 99.9% of the time, DAX performance isn't an issue. Meaning that for every 1000 DAX calculations you create you might hit a serious performance issue once. So, in that one case, sure, knock yourself out and use CALCULATE if you can.

 

Conclusion

At the end of the day, I'm not here to convince you one way or another about using or not using CALCULATE. I just felt like explaining why I do not tend to use CALCULATE.

 

In short, I don't use CALCULATE because I've developed a method of solving complex DAX calculations that works every time. Create a table variable to represent your data. Use an "X" aggregator function to perform a calculation across that table. This method can solve any problem that CALCULATE can plus solve calculations that are impossible for CALCULATE. For me it's about the pattern and process to achieve success versus constantly kowtowing to the feeble limitations of CALCULATE. Most of my 150+ Quick Measures in the Gallery I would have never solved had I remained subservient to CALCULATE's way of thinking and that alone in my mind is reason enough not to use it.

 

I actually think this is a fascinating subject that strikes to the heart of a lot of DAX coding practices and philosophies. Maybe I'm wrong about everything. Either way, I would really like to get a discussion going on this subject.

Comments

Here is another good one from this thread:

https://community.powerbi.com/t5/Desktop/Properly-summing-up-time-spent-in-overlapping-appointments/...

I just happened to have it up in a tab:

Duration = 
    VAR __Current = MAX([Index])
    VAR __Start = MAX ([Start])
    VAR __End = MAX([End])
    VAR __Duration = DATEDIFF(__Start,__End,MINUTE)/60
    VAR __Table = ADDCOLUMNS(FILTER(ALL('Table (13)'),[Index]<>__Current),"__Contains",IF(__Start>=[Start]&&__End<=[End],1,0))
RETURN
    IF(MAXX(__Table,[__Contains])=1,0,__Duration)


DurationTotal = 
    IF(HASONEVALUE('Table (13)'[Index]),[Duration],
        SUMX(ADDCOLUMNS(SUMMARIZE('Table (13)',[Index]),"__Duration",[Duration]),[__Duration]))

Great post @Greg_Deckler! I'm working on different approaches to replicate row-level equivalents to Tableau's level-of-detail expressions in PowerBI. These are essentially subqueries and I've found the simplest approach to do this with CALCULATE, as follows, to find the last purchase date by customer, for example: 

CALCULATE( MAX( 'Purchases'[purchase_timestamp] ), ALLEXCEPT( 'Purchases', 'Purchases'[customer_id] ) )

Do you know of an approach that would replicate this using a virtual table in a calculated column?


I know I could easily do this with a new calculated table also, but I have a slight preference for doing this without adding to the data model (perhaps this is an unreasonable Tableau-minded bias).

@lbendlin Thanks! I should clarify: I was wondering about an alternative to the CALCULATE( [EXPRESSION], ALLEXCEPT()) pattern, as the post was about replacing uses of CALCULATE(). 

@darrenfishell Oh, that's easy, I believe the equivalent would be:

 

MAXX(CALCULATETABLE('Purchases',ALLEXCEPT('Purchases',Purchases[customer_id])),'Purchases'[purchase_timestamp)

 

 

@darrenfishell understood,  but I wanted to make sure you understand the difference between ALLEXCEPT() and REMOVEFILTERS()/VALUES() - because I didn't until I read the article.

@Greg_Deckler 

 

This

MAXX(ALLEXCEPT( 'Purchases', 'Purchases'[customer_id] ),'Purchases'[purchase_timestamp] )

is not logically equivalent to

CALCULATE( MAX( 'Purchases'[purchase_timestamp] ), ALLEXCEPT( 'Purchases', 'Purchases'[customer_id] ) )

because

daxeralmighty_0-1630609351868.png

In other words, ALLEXCEPT used as a top-level function will always return all the unique combinations (with the proviso above) regardless of any filter context. What's more, ALLEXCEPT materializes rows whereas CALCULATE(..., ALLEXCEPT(...)) not necessarily and hence it will be faster. But since the two versions are not the same, comparing speeds is not even relevant.

@darrenfishell Here is another option:

Measure 7 = SUMX(FILTER(ALL('Inventory'),'Inventory'[ItemCode] IN SELECTCOLUMNS('Inventory',"Inventory",'Inventory'[ItemCode])),'Inventory'[QtyStr])

 

Polls
What is your favorite Power BI feature released this month?