CALCUHATE - Why I Don't Use DAX's CALCULATE Functi...

Greg_Deckler · ‎07-24-2020

Introduction

OK, first off, I don't hate, hate the CALCULATE function. At least not to the same level that I despise the time "intelligence" functions. I mean, I realize that I have made reference to DAX's time "intelligence" functions being one of the four horsemen of the apocalypse but just so we are 100% clear, I truly despise them. But that's for another blog article.

So, I do not hate DAX's CALCULATE to anywhere near that level but it made for a catchy title for a blog post. In reality, I honestly just don't find myself thinking about CALCULATE most of the time any longer. Over the 5 or 6 years that I have been coding DAX I have simply developed a coding style and technique where CALCULATE is just generally useless. So let me explain how I got there and then I would love to hear everyone else's thoughts on this subject because I think it is an interesting topic with a lot of far reaching implications. In reality, the use of CALCULATE or not CALCULATE really represents two wildly different programming approaches and philosophies around DAX coding.

Variables

This may seem unrelated, but my path towards my current disuse of CALCULATE really all started with my use of variables. And I will forever be indebted to @konstantinos for introducing me to the concept of variables in DAX. Truly revolutionized my thinking about what was possible with DAX. In my opinion, the use or non-use of DAX variables is a key, fundamental difference in programming approaches and philosophies.

Prior to variables, DAX coding was pretty unintuitive and; quite frankly, ugly. You would end up with these long, ugly, nested to the n'th degree DAX calculations that were difficult to create and even worse to troubleshoot. Or, you would end up with 63,000 measures that was a absolute debugging trainwreck. Or both.

In the first case, infinitely nested DAX is awful to deal with. You essentially have to create the code from the inside out and it's just an unnatural experience overall. To debug it, you also have to read it inside out. Also unnatural. To troubleshoot it once it is created is sheer frustration trying to comment out bits and pieces of code above and below the part you are interested in. Not impossible but super annoying.

In the second case, using measures as a substitute for variables devolves into a maintenance nightmare. One it muddles up the data model. There are lots of instances where you never actually have to reuse a measure somewhere else so having it hanging out in the data model is a waste. Two, it means the DAX code is spread out all over creation. So if you are trying to debug the code it is not all in once place, you have to start at one DAX calculation and then trace everything back to nested upon nested upon nested DAX measures. I've done it, it's truly an awful experience.

Variables tend to solve these issues. Using variables allows you to segment complex calculations into discreet, easily testable chunks of code that are self contained and fit a more natural, top down coding style. When using variables, if you need to test if a particular piece of code is being calculated correctly, you simply edit your RETURN statement to return the variable that is calculating that piece of code. Easy peasy. Furthermore, you can solve a problem in a non-linear and flexible way. With nested DAX, you need to follow an extremely rigid and linear path from A to B. And that's just not very enjoyable or, in my opinion, efficient. Third, when I go back to troubleshoot the code after a few months, all the code is in one spot, I do not need to go hunting and searching for dozens of different, separate measures to track down bits and pieces of code. Finally, variables prevent duplication of code such as in this classic:

Measure = 
  VAR __Sum = SUM('Table'[Column])
RETURN
  IF(ISBLANK(__Sum),0,__Sum)

Now think if the equivalent of __Sum is some long, nested DAX statement and you have to repeat it twice in your DAX forumla...

OK, so that's why I really like to use variables and I use them a lot. In fact, I will tend to use them in even simple calculations that could be done in a single DAX statement just because its generally likely that DAX calculations become more complex over time so might as well start with a good base model that can be extended later.

What Do Variables Have to Do with CALCULATE?

So I know what you are thinking. What in the world does the use of variables have to do with using or not using CALCULATE? Well, the answer is quite simple really. Once you start to use variables you tend to graduate to using table variables. And once you are using tables variables, CALCULATE pretty much becomes useless. Let me demonstrate with a quick example. The following measure does not work:

Measure = 
    VAR __Table = { ("One",1), ("Two",2), ("Three",3) }
RETURN
    CALCULATE(SUM([Value2]),__Table)

CALCULATE can't comprehend the Value2 column. Sure, even though calculate's second argument accepts a "table filter expression", this is NOT the same as accepting any expression that returns a table. For calculate to work, everything still needs to be grounded in actual tables within the data model. Conversely, this works just fine:

Measure = 
    VAR __Table = { ("One",1), ("Two",2), ("Three",3) }
RETURN
    SUMX(__Table,[Value2])

Unlike CALCULATE, the "X" aggregation functions can operate on virtual tables with ease as well as on actual tables in the data model.

It is because of this fact that I really started to eschew CALCULATE for other methods. Call me lazy or maybe I was dropped on my head as a baby or perhaps MAMA DIDN'T LOVE ME. Who can say, but if I have two ways of doing something and one works all the time and the other only works some of the time, I tend to go with the one that works all the time unless there is an incredibly pressing reason not to. For coding, I see it as a code maintainability thing. If there are multiple ways of solving a recurring coding construct, I want everyone on the team using the same method if at all possible. Having six or seven different ways of doing the same thing is just heaping technical debt upon technical debt for no good reason. In point of fact, I cannot think of a single thing that can be done with CALCULATE that can't be done via some other method, which kind of makes it superfluous.

Let's face it, Measure Totals the Final Word, you can't solve that problem using CALCULATE. In fact, it's right around that 2017/2018 time frame that I started to understand that CALCULATE was more of a shackle than something helpful.

Reality

Alright, so hopefully by now you are starting to get the picture about why I tend not to use CALCULATE but I'm not quite done. You see, it is the fact that CALCULATE is so tied to the actual tables in the data model that is perhaps its greatest downfall. In fact, most non-trivial DAX calculations are not possible by using CALCULATE unless you happen to have a 100% idealized data model. You can see this in Microsoft's own DAX examples as well as the examples on daxpatterns.com, etc. In almost every case where CALCULATE is used, it is dealing with the AdventureWorks database, the idealized of all idealized data models. Here CALCULATE can shine because the calculations are really pretty trivial thanks to a, let's all be honest, too perfect data model.

Reality check, the vast majority of data models that are used in Power BI are far, far from being optimized or in any way ideal. The reality is that 80-90% of the people using Power BI are not even professional IT people, let alone professional data modelers. It's just not realistic. And, as you get further and further away from an idealized data model, the less and less useful CALCULATE tends to be.

Sure, in theory, CALCULATE is a super useful function. However, once you inject a bit of reality, it is far, far less useful. I can't help it, I was educated as an engineer. Theory is great and all, but if you can't apply it who really cares?

Performance

OK, this article would not be complete if I did not address perhaps one of the biggest reasons why people tout CALCULATE, performance. I won't argue with any of the information about how and why CALCULATE is super performant. Sure, again, that's all great in theory. In reality, just exactly how often does DAX performance come up anyway? I would say not very often at all. Probably 99.9% of the time, DAX performance isn't an issue. Meaning that for every 1000 DAX calculations you create you might hit a serious performance issue once. So, in that one case, sure, knock yourself out and use CALCULATE if you can.

Conclusion

At the end of the day, I'm not here to convince you one way or another about using or not using CALCULATE. I just felt like explaining why I do not tend to use CALCULATE.

In short, I don't use CALCULATE because I've developed a method of solving complex DAX calculations that works every time. Create a table variable to represent your data. Use an "X" aggregator function to perform a calculation across that table. This method can solve any problem that CALCULATE can plus solve calculations that are impossible for CALCULATE. For me it's about the pattern and process to achieve success versus constantly kowtowing to the feeble limitations of CALCULATE. Most of my 150+ Quick Measures in the Gallery I would have never solved had I remained subservient to CALCULATE's way of thinking and that alone in my mind is reason enough not to use it.

I actually think this is a fascinating subject that strikes to the heart of a lot of DAX coding practices and philosophies. Maybe I'm wrong about everything. Either way, I would really like to get a discussion going on this subject.

AllisonKennedy · ‎07-24-2020

@Greg_Deckler wrote:

Reality check, the vast majority of data models that are used in Power BI are far, far from being optimized or in any way ideal. The reality is that 80-90% of the people using Power BI are not even professional IT people, let alone professional data modelers. It's just not realistic. And, as you get further and further away from an idealized data model, the less and less useful CALCULATE tends to be.

I think this paragraph you wrote explaining why CALCULATE is so problematic is also the same reason that it is so widely used. If you estimate that 80-90% of people using Power BI are not professional IT people, data modelers and add to that coders, then the concept of variables, operators such as and (&&) or (||), and table functions are new and foreign and take time to master. CALCULATE is deceptively simple in its design and a quick way to get started doing some powerful calculations with DAX.

@Greg_Deckler wrote:
For coding, I see it as a code maintainability thing. If there are multiple ways of solving a recurring coding construct, I want everyone on the team using the same method if at all possible. Having six or seven different ways of doing the same thing is just heaping technical debt upon technical debt for no good reason. In point of fact, I cannot think of a single thing that can be done with CALCULATE that can't be done via some other method, which kind of makes it superfluous.

I too am keen to hear the discussion around this, as DAX is very diverse and there's more than one way to get the desired result. While I do like the idea of everyone on the team using the same method, I wonder if that scenario is idealized as the AdventureWorks dataset? There will always be some variations in coding style amongst the team, part of what makes it so fascinating and at times can even give insight into personality and inner thought process of team members.

That being said, there are definitely "better" methods and "worse" methods. Perhaps it is the duty of those more adept at DAX to assist those less experienced in understanding better why to use variables instead of CALCULATE or SUMX rather than CALCULATE and explain the implications and meaning behind it. I have found that CALCULATE can mask some poor coding practices, hiding bad habits that don't rear their ugly heads until much later when the nightmares of troubleshooting you described above become necessary to fix the problems.

And yet, I still find myself suggesting CALCULATE as a solution to questions in the forum just because I'm lazy and it's faster than rewriting the original poster's entire suggestion or providing them with a complex variable formula that they won't understand and won't be able to support.

As a side note, I feel like Microsoft should award a prize to anyone who can stump you and come up with a use for CALCULATE that can't be done another way. 🙂

Greg_Deckler · ‎07-25-2020

Thanks @AllisonKennedy , I really appreciate your thoughts around this. Your work is always fantastic and insightful. Big fan. You make a great point about people new to DAX and CALCULATE kind of being overused. Thinking back on things, I definitely went through a period when learning DAX that I thought CALCULATE was the greatest function ever. I think you are correct that a lot of people kind of go through that stage. "I want this formula calculated in this context". It's an easy entry into getting a handle on context within DAX. That said though, I do not believe that CALCULATE is only for DAX beginners. The evidence is daxpatterns.com, CALCULATE is all over the place and those guys are definitely not noobs!!

Also, you are correct about having all of your coding team do the same thing as a bit idealized. That said, within Fusion, we do have coding standard guidelines that specify naming conventions, coding conventions and patterns. Whether they are always followed or not, that's another thing all together. So, ultimately, all I can control is myself and whether I remain consistent with my code. And I find it helpful to be consistent. It means that whatever piece of code I pick up that I wrote I can troubleshoot and debug it with ease because it always follows the same pattern.

Great stuff! Again, I really appreciate your thoughts on this subject, keep them coming. And I am always up for a challenge! I was perhaps a bit brash there claiming that there is nothing that CALCULATE can do that can't be done another way!! 🙂

lbendlin · ‎07-27-2020

Two comments and a disgreement:

1. Don't drag variables into this fight. Yes, everything in DAX is a table, and table variables are useful. But variables also made EARLIER() obsolete, and for that they deserve to be left alone.

2. The best measure is one that works for both the individual visual cells and for the totals. Whichever approach you are using that an achieve that is the right approach. I tend to lean your way on that.

3. Performance is important. If you are dealing with millions of data rows and one very impatient CxO then a three second visual rendering is preferable to a 45 second render. The best measure is the one with the best performance at scale, the shortest Query plan, most time spent in SE, the fewest records scanned, and the most callbacks avoided.

Greg_Deckler · ‎07-28-2020

@lbendlin - First of all, really appreciate your thoughts! Hopefully this does not turn into a fight, that was not my intention!! 🙂 I think we both agree that variables are great, the only reason they were dragged into the conversation is that my progression went, variables, table variables, oh, CALCULATE hates table variables...

I also agree that performance is important. My only point though is that I find that for a large number of data sets, even data sets that have millions of rows, I do not tend to experience DAX performance issues that require extensive optimizations. Absolutely, it happens and when it does, I have to go down a different path sometimes than I would typically.

daxer-almighty · ‎08-29-2020

As far as I know, CALCULATE(TABLE) is the ONLY function in DAX that can manipulate/change the original filter context. And programming something useful in DAX is all about changing the original filter contexts; without this, you'd be doing 2 + 2 = 4 and even though true, not very exciting, is it? How then could you not use CALCULATE or shy away from it? It's simply not possible for 99% meaningful and useful calculations out there to be performed without CALCULATE. And it is good that you mention the DAX patterns by The Italians. The code in there is littered with CALCULATE because there is no other way to do it; no amount of variables and stunts in the code will let you change the context. If you think otherwise, it would only be fair to demonstrate.

I'd like to see something REAL and tangible in your post, something taken straight from reality, and not merely examples that are more artificial than the database you mention. Can you find some non-trivial calculations (not just one or two, it's too few to make a strong point) which requires changing the filter context and which can be done without CALCULATE? If you can, then please make a good point and demonstrate it. I'd be seriously curious to see this because I'm not too convinced that I should get rid of CALCULATE (or at least look for ways to minimize its use).

Thanks.

Greg_Deckler · ‎08-29-2020

Hello @darlove - Good to see you back on the boards. I can't fathom what you mean that CALCULATE is the only way to change filter context when FILTER(ALL('Table'),[column1] = "blah") changes filter context just fine. It changes it from whatever it was on Table to [column1]="blah".

Bottom line, the methdology I use can solve any problem that CALCULATE can but unless I am mistaken, CALCULATE cannot solve any problem that my methodology can. I would like to see Krippendorff's Alpha solved using CALCULATE. Or TRANSPOSE, or MMULT, or, well, there's too many to count.

daxer-almighty · ‎08-29-2020

@marcorusso - Marco, would you please comment on @Greg_Deckler's claim that it's not true that only CALCULATE can change the filter context? I'm 99% sure he misses the point badly when he claims that FILTER( ALL(T), T[C] = "V") can do that. I'm sure this function CANNOT change the filter context but I'd be grateful if you could comment on this as you're the most knowledgeable person about it. Also, would you please comment on the claim that "And once you are using tables variables, CALCULATE pretty much becomes useless."? I'm really interested in your thoughts about such a bold statement.

@Greg_Deckler - Greg, would you please give me some tangible, concrete examples of non-trivial calculations in which you have to change the filter context and where you can do without CALCULATE? In a word, put your money where your mouth is.

Thanks again.

Greg_Deckler · ‎08-29-2020

@daxer-almighty - It's not a hard thing to test. Try using a table variable in a CALCULATE statement. Try something simple like a Table with a single column Value and do this:

Measure = 
    VAR __Table = 'Table'
RETURN
    CALCULATE(__Table,[Value]=1)

[Value] will be underlined red and the error will be "The value for 'Value' cannot be determined. Either the column doesn't exist or there is no current row for this column".

Hence, useless.

Also, so you are saying that ALL and ALLSELECT don't change filter context? It's literally in the definition of the function:

"Returns all the rows in a table, or all the values in a column, ignoring any filters that might have been applied. This function is useful for clearing filters and creating calculations on all the rows in a table."

https://docs.microsoft.com/en-us/dax/all-function-dax

Emphasis added.

Finally, Marco already chimed in on this thread on either LinkedIn or Twitter and, classic Marco, he took the high road of course! That guy is a 10. And I wasn't writing this article to trash CALCULATE, the point I was making was by breaking out of the rigid thinking behind CALCULATE and/or using CALCULATE as a crutch, I have been able to push the limits of DAX and do things that I could not have done had I stuck to only using CALCULATE.

And what happens in the bowels of DAX, whether the queries are rewritten in some way, that's not what I am talking about. I am talking about at the programming level, how you structure DAX code entered into Power BI Desktop for example.

Let's see, money where mouth is, that's easy, take this table:

BUDGET NAMEYEARAttributeValue

ABC123456	2019	JAN	1000
ABC123456	2019	FEB	1500
ABC123456	2019	MARCH	2000
ABC123456	2019	APRIL	3000

Put Attribute and Value in a table visualization. Create this measure:

Measure = SUMX(FILTER(ALL('Table'),[Attribute] = "FEB"),[Value])

Put this measure in your visual, there you go:

Heck, let's go nuts and throw a slicer in there:

No CALCULATE necessary to change filter context.

Greg_Deckler · ‎08-29-2020

Here is another good one from this thread:

https://community.powerbi.com/t5/Desktop/Properly-summing-up-time-spent-in-overlapping-appointments/...

I just happened to have it up in a tab:

Duration = 
    VAR __Current = MAX([Index])
    VAR __Start = MAX ([Start])
    VAR __End = MAX([End])
    VAR __Duration = DATEDIFF(__Start,__End,MINUTE)/60
    VAR __Table = ADDCOLUMNS(FILTER(ALL('Table (13)'),[Index]<>__Current),"__Contains",IF(__Start>=[Start]&&__End<=[End],1,0))
RETURN
    IF(MAXX(__Table,[__Contains])=1,0,__Duration)


DurationTotal = 
    IF(HASONEVALUE('Table (13)'[Index]),[Duration],
        SUMX(ADDCOLUMNS(SUMMARIZE('Table (13)',[Index]),"__Duration",[Duration]),[__Duration]))

daxer-almighty · ‎08-29-2020

First mistake (albeit it does not change the gist): You use CALCULATE instead of CALCULATETABLE. I hope you know the difference and the fact that for tables you must use the latter.

Second mistake: CALCULATETABLE works for physical tables. I can't see any value in the example you've shown. First, it's incorrect for 2 reasons: a measure can't return a table and you should use CALCULATETABLE if anything. Second, if you want to return a filtered table, then why do you even need a variable when this

EVALUATE
    CALCULATETABLE(
    	'Table',
    	'Table'[Value] = 1
    )

will do? And if you're really stubborn and have to use the variable... then you can do this

EVALUATE
var __Table = 'Table'
return
    CALCULATETABLE(
    	'Table',
    	'Table'[Value] = 1,
    	__Table
    )

but I can't see a single reason to do that. Nobody in their right senses would code like this even though this code is syntactically correct.

Third mistake: There's a huge difference between changing a filter context and IGNORING IT. HUUUUGE. If you don't know why (and it looks like that), then it's one more reason you should again reach for "The Definitive Guide to DAX." It's only when you use the ALL* functions UNDER CALCULATE that they change their semantics and clear filters, thus contributing to the context change executed by CALCULATE. If they're top-level functions they do something completely different - they let you ignore filters (but don't change the context in the slightest).

There's no rigid thinking behind CALCULATE. It's a function that is NECESSARY to do what needs to be done. If one knows DAX well, then one also knows that you can't code without it and one also knows WHERE it's absolutely necessary and no amount of FILTERs, ALLs and hand-waving can do.

I'll give you a simple real-life example that shows why you can't just say "And once you are using tables variables, CALCULATE pretty much becomes useless." Imagine that you have a complex measure that is non-trivial (maybe built upon other measures) and is pretty long, say, 50 lines, for the sake of argument. You now have to calculate its value on a CHANGED set of rows within the current context. For instance, you have to change a filter on some dimensions. This is one, very clear example where no amount of hocus-pocus with variables, FILTERS, and other hand-waving will do. YOU HAVE TO USE CALCULATE. And this scenario is so common that nothing can beat it. There does not exist a more common scenario.

daxer-almighty · ‎08-29-2020

My friend... All your examples are SIMPLE, very simple. Once you start doing REAL DAX in the real world when complexity is mounting at a staggering pace, you won't get far without CALCULATE. I know what I'm talking about.

Greg_Deckler · ‎08-30-2020

@daxer-almighty - You call Krippendorf's Alpha simple? It's over 700 lines of DAX code. CALCULATE is not used. You can download the PBIX. In fact, you have to in order to look at DAX, it is actually too long to post on this site. https://community.powerbi.com/t5/Quick-Measures-Gallery/Krippendorff-s-Alpha/m-p/641996#M323

Look, I've been doing Power BI in the "real" world for a long time. I'm the 313th person in the world to register for this community and I was doing things with the precursor's of Power BI before even that. I'm pretty confident I have a solid grasp on the DAX language. I have written an entire book on it after all.

But, nothing will ever convince you of anything most likely. You ask for specific things, I show you those specific things, I prove you 100% incorrect in your statement that CALCULATE is the ONLY way to change filter context. And you come back with a rather childish statement about this is super simple or that is super simple. Seems you just want to troll and not engage in a serious conversation.

The reality is, there are at least two ways to do just about everything in DAX, a way that uses CALCULATE and a way that doesn't use CALCULATE. If you want to see further examples of how you can not use CALCULATE to solve over 120 problems using DAX, just look at the PBIX files for my book, because CALCULATE is not used in any of them that I recall. And those are over 120 problems specifically designed to solve "real world" problems like days of supply, mean time between failures, ABC classification, on time and in full, overall equipment effectiveness (OEE), order cycle time. All free for you to browse since you want "complex" examples of DAX that doesn't use CALCULATE. https://github.com/gdeckler/DAXCookbook

If you have an example of a calculation where there is no other way to solve it other than CALCULATE, post it. What was your phrase? "In a word, put your money where your mouth is." That's a phrase, not a word actually. Just FYI.

darrenfishell · ‎09-01-2021

Great post @Greg_Deckler! I'm working on different approaches to replicate row-level equivalents to Tableau's level-of-detail expressions in PowerBI. These are essentially subqueries and I've found the simplest approach to do this with CALCULATE, as follows, to find the last purchase date by customer, for example:

CALCULATE( MAX( 'Purchases'[purchase_timestamp] ), ALLEXCEPT( 'Purchases', 'Purchases'[customer_id] ) )

Do you know of an approach that would replicate this using a virtual table in a calculated column?

I know I could easily do this with a new calculated table also, but I have a slight preference for doing this without adding to the data model (perhaps this is an unreasonable Tableau-minded bias).

lbendlin · ‎09-01-2021

@darrenfishell did someone say ALLEXCEPT()? Using ALLEXCEPT versus ALL and VALUES - SQLBI

darrenfishell · ‎09-02-2021

@lbendlin Thanks! I should clarify: I was wondering about an alternative to the CALCULATE( [EXPRESSION], ALLEXCEPT()) pattern, as the post was about replacing uses of CALCULATE().

CALCUHATE - Why I Don't Use DAX's CALCULATE Function

How to provide sample data in the Power BI Forum

Re: Power BI Tip: How To Sort A Column With Duplic...

Enhanced Data Profiling in Power Query: GUI and Ta...

Creating a “custom” or “hybrid” matrix in PowerBI

Supercharge Your Visuals: Easy Conditional Formatt...