cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Anonymous
Not applicable

SUMX & FILTER or SUM & CALCULATE: Best practice for summing subset of columns

Hi guys,

 

quick question: If I want to sum a subset of a column, for example the sum of the sales of only red products, which approach is better suited?

 

1.SUMX and FILTER

Red Sales 1 =
SUMX ( FILTER ( Sales; Sales[ProductColor] = "Red" ); Sales[Amount] )


or

 

2. CALCULATE and SUM

Red Sales 2 =
CALCULATE ( SUM ( Sales[Amount] ); Sales[ProductColor] = "Red" )

 

Thanks in advance.

Best regards

Lucas

 

1 ACCEPTED SOLUTION
AntrikshSharma
Resident Rockstar
Resident Rockstar

You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:

 

=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )
=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )
=
CALCULATE (
    [Sales Amount],
    Customer[Gender] = "Male",
    Products[Color] IN { "Green", "Yellow", "White" },
    USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
    FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)

 

 

In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.

 

View solution in original post

11 REPLIES 11
Greg_Deckler
Super User
Super User

@Anonymous - Thanks for inspiring me today. 

Check out my latest article, CALCUHATE - Why I Don't Use DAX's CALCULATE Function.

https://community.powerbi.com/t5/Community-Blog/CALCUHATE-Why-I-Don-t-Use-DAX-s-CALCULATE-Function/ba-p/1248635

 


@ me in replies or I'll lose your thread!!!
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Learn Power BI 2nd Edition
AntrikshSharma
Resident Rockstar
Resident Rockstar

You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:

 

=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )
=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )
=
CALCULATE (
    [Sales Amount],
    Customer[Gender] = "Male",
    Products[Color] IN { "Green", "Yellow", "White" },
    USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
    FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)

 

 

In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.

 

Anonymous
Not applicable

@AntrikshSharma Thanks for the clarification. I was leaning towards this solution, too.

Could you do me another favor and answer a follow-up question to your answer?

The third example you posted seems to be a calculation based on the running total / cummulative total pattern:

=
CALCULATE (
[Sales Amount],
Customer[Gender] = "Male",
Products[Color] IN { "Green", "Yellow", "White" },
USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)

 I've used the pattern myself but I still get confused about the last part. Could you give me feedback If I've understood it correctly?

  • Dates[Date] : refers to the column
  • MAX ( Dates[Date] ) : We're in the row context now. There is only one Date which we retrieve with the MAX() function. But since there is only one date anyway in theory we could also use the MIN() or SELECTEDVALUE() functions to get this date

Thanks in advance.

Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same.

Running Total = 
VAR MaxDateInFilterContext = MAX ( Dates[Date] )
VAR MaxYear = YEAR ( MaxDateInFilterContext )
VAR DatesLessThanMaxDate =
    FILTER (
        ALL ( Dates[Date], Dates[Calendar Year Number] ),
        Dates[Date] <= MaxDateInFilterContext
            && Dates[Calendar Year Number] = MaxYear
    )
VAR Result =
    CALCULATE (
        [Total Sales],
        DatesLessThanMaxDate
    )
RETURN
    Result

 take a look at the below screenshot. When you are at 2nd January, what is the MAX date? it is 2nd and the amount is calculated for all dates less than 2nd which is returned by the variable DatesLessThanMaxDate , next DAX is evaluated for 3rd January, now what is the MAX value? it is 3rd January, similarly it is being checked for each cell of the report.

1.PNG

Hence yes, you can use MIN or SELECTEDVALUE but their values will not be related to the Row context created by FILTER

Hello @AntrikshSharma

if you can find the time I would appreciate help in explaining the concept you just described.

 

Especially I would like to know what you said about the MAX function which is wrapped in the FILTER function, why its not evaluated row by row, although its actually inside the filter function, which is a iterator?: 

"Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same."

 

For me it is actually inside the Calculate function as well, because Calculate is outside of everything.

Also shouldn't Filter function works row by row?  Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated? Looking The following seems to make sense but isn't going to work without aggregation:

FILTER ( ALL ( Dates ), Dates[Date] < ( Dates[Date] ) 

instead of:

FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) 

 Why I'm asking that is because when I do the same kind of calculation in a calculated column inside a table variable I can write like this without problem:

 

Column= 
Var _TableVariable= Filter( 'Calendar', 'Calendar' [Date]>='Maintable'[BookingDate] && 'Calendar' [Date]<='Maintable'[FinishDate] && 'Calendar[Day]= "Monday"))
return
countrows(_TableVariable) 
 
Here I don't need to aggregate the dates, it knows automatically when I say
  'Calendar' [Date]>='Maintable'[BookingDate] && 'Calendar' [Date]<='Maintable'[FinishDate]
that I'm talking about ALL the Calendar Dates which are bigger than the booking dates and smaller than then the FinishDate when the specific fow in the calendar table got evaluated row by row. 
 

 

Thank you for your time.

Best regards. 

@Applicable88 

For me it is actually inside the Calculate function as well, because Calculate is outside of everything.


That doesn't matter, CALCULATE prepares the new filter context by evaluating the filter argument like FILTER ( ... MAX )) in the original filter context that was active when CALCULATE was called. What is the original filter context here? the one created by the column in a report. Although Nested calculated may have different original filter context created by an outer CALCULATE.

 

Also shouldn't Filter function works row by row? -

 

FILTER does work row by row but when you use another function in the second argument of FILTER, then it doesn't depend on the row that is currently iterated unless that function is wrapped inside CALCULATE like CALCULATE ( MAX ) )

 

Read this: https://forum.enterprisedna.co/t/max-date-date-measure-used-in-the-filter-function/20123/5

 

Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated?

 

I know why you are confused, you are thinking that MAX depends on the row iterated by FILTER but there is no relation between MAX and the currently iterated row in FILTER, simply put MAX is an aggregator that is evaluated in the original Filter context that is outside of CALCULATE (in this case)

@AntrikshSharma thank you so much for your thorough explanation. The link you provided was of tremendous help.  Keep up the good work!

Best. 

Anonymous
Not applicable

Thanks again @AntrikshSharma ! I really appreciate that you took the time to write such a detailed explanation. This is a topic I was struggeling with for weeks. 

You're welcome, feel free to ask more questions if something isn't clear. Have a great day!

 

Here is a video for those who need more explanation on this topic.

Greg_Deckler
Super User
Super User

@Anonymous - I prefer the first approach with SUMX. The reason is because it works in cases where you have a table variable as well as a physical table. Thus, it is much more flexible and can be used in more situations than the second. I tend not to use CALCULATE at all because whatever you can do with CALCULATE can be done other ways.


@ me in replies or I'll lose your thread!!!
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Learn Power BI 2nd Edition

Helpful resources

Announcements
Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Power BI Dev Camp Session 23 768x460.jpg

Check it Out!

Mark your calendars and join us on Thursday, June 30 at 11a PDT for a great session with Ted Pattison!

June 20 episode 7 with aka link 768x460.jpg

The Power BI Community Show

Join us on June 20 at 11 am PDT when Kim Manis shares the latest on Azure Synapse analytics, the Microsoft Intelligent Data Platform, and notable Power BI Updates from Microsoft Build 2022.

Top Solution Authors
Top Kudoed Authors