cancel
Showing results for
Did you mean:
Anonymous
Not applicable

## SUMX & FILTER or SUM & CALCULATE: Best practice for summing subset of columns

Hi guys,

quick question: If I want to sum a subset of a column, for example the sum of the sales of only red products, which approach is better suited?

1.SUMX and FILTER

`Red Sales 1 =SUMX ( FILTER ( Sales; Sales[ProductColor] = "Red" ); Sales[Amount] )`

or

2. CALCULATE and SUM

`Red Sales 2 =CALCULATE ( SUM ( Sales[Amount] ); Sales[ProductColor] = "Red" )`

Best regards

Lucas

1 ACCEPTED SOLUTION
Resident Rockstar

You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:

``````=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )``````
``````=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )``````
``````=
CALCULATE (
[Sales Amount],
Customer[Gender] = "Male",
Products[Color] IN { "Green", "Yellow", "White" },
USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)``````

In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.

11 REPLIES 11
Super User

@Anonymous - Thanks for inspiring me today.

Check out my latest article, CALCUHATE - Why I Don't Use DAX's CALCULATE Function.

https://community.powerbi.com/t5/Community-Blog/CALCUHATE-Why-I-Don-t-Use-DAX-s-CALCULATE-Function/ba-p/1248635

Become an expert!: Enterprise DNA
External Tools: MSHGQM
Latest book!:
Learn Power BI 2nd Edition
Resident Rockstar

You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:

``````=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )``````
``````=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )``````
``````=
CALCULATE (
[Sales Amount],
Customer[Gender] = "Male",
Products[Color] IN { "Green", "Yellow", "White" },
USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)``````

In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.

Anonymous
Not applicable

@AntrikshSharma Thanks for the clarification. I was leaning towards this solution, too.

The third example you posted seems to be a calculation based on the running total / cummulative total pattern:

`=CALCULATE ([Sales Amount],Customer[Gender] = "Male",Products[Color] IN { "Green", "Yellow", "White" },USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) ))`

I've used the pattern myself but I still get confused about the last part. Could you give me feedback If I've understood it correctly?

• Dates[Date] : refers to the column
• MAX ( Dates[Date] ) : We're in the row context now. There is only one Date which we retrieve with the MAX() function. But since there is only one date anyway in theory we could also use the MIN() or SELECTEDVALUE() functions to get this date

Resident Rockstar

Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same.

``````Running Total =
VAR MaxDateInFilterContext = MAX ( Dates[Date] )
VAR MaxYear = YEAR ( MaxDateInFilterContext )
VAR DatesLessThanMaxDate =
FILTER (
ALL ( Dates[Date], Dates[Calendar Year Number] ),
Dates[Date] <= MaxDateInFilterContext
&& Dates[Calendar Year Number] = MaxYear
)
VAR Result =
CALCULATE (
[Total Sales],
DatesLessThanMaxDate
)
RETURN
Result``````

take a look at the below screenshot. When you are at 2nd January, what is the MAX date? it is 2nd and the amount is calculated for all dates less than 2nd which is returned by the variable DatesLessThanMaxDate , next DAX is evaluated for 3rd January, now what is the MAX value? it is 3rd January, similarly it is being checked for each cell of the report.

Hence yes, you can use MIN or SELECTEDVALUE but their values will not be related to the Row context created by FILTER

Impactful Individual

Hello @AntrikshSharma

if you can find the time I would appreciate help in explaining the concept you just described.

Especially I would like to know what you said about the MAX function which is wrapped in the FILTER function, why its not evaluated row by row, although its actually inside the filter function, which is a iterator?:

"Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same."

For me it is actually inside the Calculate function as well, because Calculate is outside of everything.

Also shouldn't Filter function works row by row?  Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated? Looking The following seems to make sense but isn't going to work without aggregation:

`FILTER ( ALL ( Dates ), Dates[Date] < ( Dates[Date] ) `

`FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) `

Why I'm asking that is because when I do the same kind of calculation in a calculated column inside a table variable I can write like this without problem:

Column=
Var _TableVariable= Filter( 'Calendar', 'Calendar' [Date]>='Maintable'[BookingDate] && 'Calendar' [Date]<='Maintable'[FinishDate] && 'Calendar[Day]= "Monday"))
return
countrows(_TableVariable)

Here I don't need to aggregate the dates, it knows automatically when I say
'Calendar' [Date]>='Maintable'[BookingDate] && 'Calendar' [Date]<='Maintable'[FinishDate]
that I'm talking about ALL the Calendar Dates which are bigger than the booking dates and smaller than then the FinishDate when the specific fow in the calendar table got evaluated row by row.

Best regards.

Resident Rockstar

@Applicable88

For me it is actually inside the Calculate function as well, because Calculate is outside of everything.

That doesn't matter, CALCULATE prepares the new filter context by evaluating the filter argument like FILTER ( ... MAX )) in the original filter context that was active when CALCULATE was called. What is the original filter context here? the one created by the column in a report. Although Nested calculated may have different original filter context created by an outer CALCULATE.

Also shouldn't Filter function works row by row? -

FILTER does work row by row but when you use another function in the second argument of FILTER, then it doesn't depend on the row that is currently iterated unless that function is wrapped inside CALCULATE like CALCULATE ( MAX ) )

Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated?

I know why you are confused, you are thinking that MAX depends on the row iterated by FILTER but there is no relation between MAX and the currently iterated row in FILTER, simply put MAX is an aggregator that is evaluated in the original Filter context that is outside of CALCULATE (in this case)

Impactful Individual

@AntrikshSharma thank you so much for your thorough explanation. The link you provided was of tremendous help.  Keep up the good work!

Best.

Anonymous
Not applicable

Thanks again @AntrikshSharma ! I really appreciate that you took the time to write such a detailed explanation. This is a topic I was struggeling with for weeks.

Resident Rockstar
You're welcome, feel free to ask more questions if something isn't clear. Have a great day!
Resident Rockstar

Here is a video for those who need more explanation on this topic.

Super User

@Anonymous - I prefer the first approach with SUMX. The reason is because it works in cases where you have a table variable as well as a physical table. Thus, it is much more flexible and can be used in more situations than the second. I tend not to use CALCULATE at all because whatever you can do with CALCULATE can be done other ways.

Become an expert!: Enterprise DNA
External Tools: MSHGQM
Latest book!:
Learn Power BI 2nd Edition

Announcements