Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Greg_Deckler
Super User
Super User

CALCULATE Challenge - Round 1

For those haters of No CALCULATE like @AlexisOlson😅, I challenge you to create measures with an explicit CALCULATE that for the prescribed scenarios perform anywhere close to the same speed as the very simple No CALCULATE measures that perform date intelligence calculations. Here are the rules:

  1. Must contain an explicit CALCULATE
  2. The explicit CALCULATE must not be superfluous, i.e. CALCULATE(SUM('Table'[Value])) if using SUM('Table'[Value]) would do just fine.

A sample No CALCULATE measure is:

 

NC Intenet Sales (YTD) = 
    VAR __Date = MAX('Dates'[Date])
    VAR __Year = YEAR(__Date)
    VAR __Table = 
        SUMMARIZE(
            ALL('FactInternetSales'), 
            'FactInternetSales'[OrderDate],
            "__Year", YEAR([OrderDate]),
            "__Sales", SUM('FactInternetSales'[SalesAmount]))
    VAR __Result = SUMX(FILTER(__Table, [__Year] = __Year && [OrderDate] <= __Date),[__Sales])
RETURN
    __Result

 

 

 

The PBIX file is attached below. There are 16 measures for calculating such things as previous year to date, previous month to date, previous week to date, etc. Here is a reference video that also describes the scenario and original performance results:


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
3 ACCEPTED SOLUTIONS

I think this should qualify, though it isn't super clean.

 

VAR _AllDates_ =
    SUMMARIZE (
    	ALL ( FactInternetSales ),
    	Dates[Year],
    	Dates[Date]
    )
VAR _DateRange_ =
    WINDOW (
        1, ABS,
        0, REL,
        _AllDates_,
        ORDERBY ( Dates[Date] ),
        PARTITIONBY ( Dates[Year] )
    )
VAR _Result =  CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
    _Result

View solution in original post

AlexisOlson
Super User
Super User

thank God it's such an obvious and easy solution that even a DAX novice could understand and doesn't rely on a function introduced within that last year which pretty much means the solution was unsolvable for 7 years or so...

WINDOW isn't required. A plain filter works fine too and has been around since the beginning.

 

VAR _AllDates_ = SUMMARIZE ( ALL ( FactInternetSales ), Dates[Date]	)
	
VAR _CurrDate = MAX ( Dates[Date] )
VAR _CurrYear = YEAR ( _CurrDate )

VAR _DateRange_ =
    FILTER (
    	_ALLDates_,
    	YEAR ( Dates[Date] )  = _CurrYear &&
    	Dates[Date] <= _CurrDate
    )
VAR _Result =  CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
    _Result

 

 

However, there are still 15 other date intelligence measures in the file that are part of the challenge.

I'm not really interested in doing all 15 other ones as I don't think they'll reveal much that this YTD example doesn't already. If there is one that you think is meaningfully different, let me know.

View solution in original post

@Greg_Deckler, PW is an easy case where the TI solution or something like the following is just as fast.

CALCULATE (
    [Internet Sales],
    TREATAS ( VALUES ( Dates[Prior Week Date] ), Dates[Date] )
)

View solution in original post

37 REPLIES 37

@AlexisOlson 
Exactly.
What the formula engine is trying to do is to create the following table:

Date Date YTD Sum of SalesAmount

Then it does the aggregation over Date.

With help of some mathmatics, given that the Date table is 7,670 rows the formula engine would produce a crossjoin table of around (1/2 )* (7,670) * (7,670) = 29,414,450 rows. This is simply the area of a triangle.
In opttimized versions, the corossjoin is happening to only the dates that do exist in the fact tabe (1,124) rows. That would result in a table of around (1/2 )* (1,124 * (1,124) = 6,316,680 rows. 
This is why the cardinality of the Dates[Date] and FactInternetSales[Order Date] do matter alot.

TrollFace.jpg


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

I think this should qualify, though it isn't super clean.

 

VAR _AllDates_ =
    SUMMARIZE (
    	ALL ( FactInternetSales ),
    	Dates[Year],
    	Dates[Date]
    )
VAR _DateRange_ =
    WINDOW (
        1, ABS,
        0, REL,
        _AllDates_,
        ORDERBY ( Dates[Date] ),
        PARTITIONBY ( Dates[Year] )
    )
VAR _Result =  CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
    _Result

@AlexisOlson In my testing looking at the DAX query in Performance Analyzer this seems consistently about 40%-50% slower than the fastes No CALCULATE approach. But, we're quibbling over 10's of ms here so I feel like this passes. However, there are still 15 other date intelligence measures in the file that are part of the challenge.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@AlexisOlson 

What the formula engine is trying to do is to create the following table:

Date Date YTD Sum of SalesAmount

Then it does the aggregation over Date.

With help of some mathmatics, given that the Date table is 7,670 rows the formula engine would produce a crossjoin table of around (1/2 )* (7,670) * (7,670) = 29,414,450 rows. This is simply the area of a triangle.
In opttimized versions, the corossjoin is happening to only the dates that do exist in the fact tabe (1,124) rows. That would result in a table of around (1/2 )* (1,124 * (1,124) = 6,316,680 rows. 
This is why the cardinality of the Dates[Date] and FactInternetSales[Order Date] do matter alot.


Usually I use the Window function following a NoCALCULATE approach by pre-calculating values then pushing all other calculation to the formula engine. For example the following produces the simplist and fastest query plan along with the minimum number of storage engine queries among all other solutions.

 

Internet Sales (YTD) Window = 
VAR AllDateSales =
    SUMMARIZE(
        ALL ( FactInternetSales ),
    	Dates[Year], 
        Dates[Date],
        "@Amount", SUM ( FactInternetSales[SalesAmount] )
    )
VAR Result =
    SUMX (
        WINDOW (
            0, ABS,
            0, REL,
            AllDateSales,
            ORDERBY ( Dates[Date] ),
            PARTITIONBY ( Dates[Year] )
        ),
        [@Amount]
    )
RETURN
    Result

 

I would however say that your WINDOW-CALCULATE solution matches the performance of other NoCALCULATE solutions so I vote for it as an acceptable solution.

@AlexisOlson I'll have a look in the morning. But if it does work, I mean, thank God it's such an obvious and easy solution that even a DAX novice could understand and doesn't rely on a function introduced within that last year which pretty much means the solution was unsolvable for 7 years or so...


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
AlexisOlson
Super User
Super User

I've noticed the model involved can make a big difference in which approach works best and I don't fully understand why yet, so I'm attaching another (much bigger) model I've been tinkering with for additional comparisons.

@AlexisOlson Big difference I see is that the visual doesn't go to the Date granularity, which means it isn't the same scenario.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Good point on the granularity. Even matching granularity though, there are still different behaviors between files that I don't understand (in particular, TOTALYTD performs similarly to your measure in Contoso).

@AlexisOlson I'll have to look closer but both have a data table marked as a date table. The Dates table in Contoso though has significantly less rows in it. Triple in fact.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler  while it is quite clear that no calculate is often superior in terms of performance. I tend to agree with @lbendlin that in most cases basic functions are the better choice strictly because your co-developers might not understand why you are using this kind of fancy dax instead of basic structures. I would clasisfy this kind of dax as legacy code since it is unreasonably hard to understand when compared to what it achieves.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@ValtteriN I think you need to review the CALCULATE partial-solution to this problem and then tell me which one is more intuitive for someone to understand. Hint, they are literally the same solution except that one uses SUMX at the end and the other CALCULATE. The SUMX directly sums the value in the virtual table created. The CALCULATE applies the virtual table created as a filter of the base table over which a measure is applied.

 

Now, on the opinion of which of those is more intuitive to understand and debug, well, I have my opinion.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler  I agree that the solution without CALCULATE is elegant, but maybe the general paradigm that people have learned to write with CALCULATE as the most basic function of DAX causes the solution to feel less intuitive. Your macro sentece is spot on "DAX is easy, CALCULATE makes DAX hard..."  As things stand, if I ask colleagues at the office is the no calculate solution intuitive the answer is no. Because of this it makes more sense to me to write DAX that uses CALCULATE as the basis so that there is no unnecessary confusion and if there is a need to optimize the performace based on end-user feedback then I will change my DAX to be more performance optimized. 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@ValtteriN I think that people that have learned to use CALCULATE find the No CALCULATE approach takes some getting used to. However, that was never the intended audience for No CALCULATE. I think that for those new to DAX, learning the No CALCULATE is far more intuitive and easy to do versus learning the CALCULATE method. That's always been the intended audience. That said, I know of more than a few people that have switched from CALCULATE to No CALCULATE and never looked back.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

That would be a nice psychological experiment. Have a group of Power BI Newbies, split them in two parts, teach one part to use CALCULATE and the other part to use the aggregation functions.  Then after a month show them the "other side"  and let them decide which technique to use.

 

This is very similar to the On-Object Interaction discussion.  People who are new to Power BI have no issues using that.  Those of us who have grown up with the legacy options dialogs have a much harder time with adopting the on-object interaction.

In general you don't want to include the syntax sugar in your tests. Keep to the basic functions.

v-shex-msft
Community Support
Community Support

HI @Greg_Deckler,

Thanks for your sharling, I think these technical research will help other user to understanding how dax calculate.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors