Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
DiscreetRaven
Frequent Visitor

Find MAX within GROUP first, THEN filter resulting subset for desired value

This [TABLE1] is a simplified representation of my data. The TRUE data is 40 columns of 10M records.

Contract

Post

Event

Dollars

Status

666666

1/1/2021

5

$5,000

good

666666

2/1/2021

7

$4,750

good

666666

3/1/2021

9

$4,500

better

666666

4/1/2021

11

$4,250

bad

666666

5/1/2021

13

$4,000

worse

666666

6/1/2021

14

$3,750

good

666666

7/1/2021

17

$3,500

better

777777

11/11/2020

2

$12,000

good

777777

12/12/2020

3

$11,000

better

777777

12/12/2020

6

$10,000

better

777777

12/12/2020

8

$9,000

bad

777777

1/13/2021

12

$8,000

better

777777

2/14/2021

23

$7,000

bad

777777

3/15/2021

29

$6,000

worse

888888

5/5/2019

15

$28

good

888888

6/6/2019

21

$27

good

888888

7/7/2019

22

$26

good

888888

4/24/2020

24

$25

bad

888888

4/24/2020

25

$0

good

888888

3/31/2021

26

$0

good

888888

4/12/2021

27

$0

good

THE OBJECTIVE is to get the [Dollars] in the MAX [Event] record within each [Contract] group where the [Post] date is less than or equal to a user-supplied date. But, only if the MAX [Event] record’s [Status] is ‘good’ or ‘better’. (Determination of the MAX [Event] (prior to the specified [Post] date) takes precedence over the [Status].)

My first inclination is to slice out those records that don’t meet the date criteria, then to find the maximum [Event] for each remaining [Contract] group.

DEFINE
    VAR AsOfThisDate =
        DATE ( "2021", "03", "24" )
    VAR BeforeDate =
        SUMMARIZECOLUMNS (
            Table1[Contract],
            Table1[Event],
            FILTER ( Table1, Table1[Post] <= AsOfThisDate )
        )
    VAR MaxEvent =
        GROUPBY (
            BeforeDate,
            Table1[Contract],
            "@Event", MAXX ( CURRENTGROUP (), Table1[Event] )
        )
EVALUATE
    MaxEvent

This results in the ONE MAX [Event] record per [Contract] where the [Post] date is less than or equal to the supplied date. Great! Now that I’ve cut the table size down to a much smaller subset, the remaining step is to retrieve the [Dollars] value from each of these records where the [Status] is “good” or “better”.

This is where I’m stuck. I suspect I’m attacking this problem in a sub-optimal, certainly less-than-ideal manner. I’ve not made any progress in my attempts at using CALCULATE. I thought something like this might work, but the ‘MaxEvent’ GROUPBY table doesn’t include [Dollars] or [Status] values, of course.

    VAR GoodDollars =
        CALCULATE (
            MIN ( MaxEvent[Dollars] ),
            MaxEvent[Status] = "good" || MaxEvent[Status] = "better",
        )
EVALUATE
    GoodDollars

How about this, instead? No such luck. It makes sense to me, but not to DAX Studio:

    VAR GoodDollars =
        CALCULATE (
            SUM ( Table1[Dollars] ),
            FILTER ( Table1, MaxEvent[Contract] = Table1[Contract] ),
            FILTER ( Table1, MaxEvent[@Event] = Table1[Event] ),
            FILTER ( Table1, Table1[Status] = "good" || Table1[Status] = "better" )
        )
EVALUATE
    GoodDollars

I don’t think I need to apply the [Status] filter to the entire table; only to the individual MAX [Event] record for each [Contract]. Am I making this more complicated than it needs to be? Probably. I’m not interested in aggregate values, either. I want to single out each [Contract]’s [Dollars] value for use in other measures.

Can you help me solve this puzzle, please? And, how to optimize the solution, given the data volume?

1 ACCEPTED SOLUTION

Thanks to you, @lbendlin, for taking the time. Your suggestion inspired me to look at the problem from another angle. I love the simplified approach but had to include a reference to [Event] to make it work. (I don't want the MAX [Status] prior to the selected date. I want the [Status] from the row with the MAX [Event] prior to the selected date.)

GoodDollars =
-- CHOOSE Post Date
VAR P =
ALLSELECTED ( 'Calendar'[Date] )
-- DETERMINE the Single Most Recent Event Row for a Contract Prior or Equal to the Chosen Post Date
VAR E =
CALCULATE ( MAX ( Table1[Event] ), ALLEXCEPT ( Table1, Table1[Contract] ), Table1[Post] <= P )
-- DETERMINE Status in the Row of the Most Recent Event (within All Rows for a Contract)
VAR S =
CALCULATE ( MAX ( Table1[Status] ), Table1[Event] = E )
-- DETERMINE Dollars in the Row of the Most Recent Event
VAR D =
CALCULATE ( SUM ( Table1[Dollars] ), Table1[Event] = E )
-- CHOOSE Dollars from the Row of the Most Recent Event ONLY IF the Status is Acceptable
RETURN
IF ( S IN { "good", "better" }, D )

The result:

DiscreetRaven_2-1631404706760.png

This solution "works", but (as you noted) is not optimized. It's just fine for a sample table of two dozen records. Ten million plus - not so much. My first attempt was intended to create a much smaller 'temporary' table filtered to only those records remaining to be tested for their [Status]. 'MaxEvent' does just that. I want to believe I can directly reference the columns of ‘MaxEvent’ without having to recalculate the table a second time to determine the [Status], and a third time to determine [Dollars]. If that's possible, it's got to be more efficient. As the SQLBI article @v-shex-msft pointed out says, the problem remains "the cardinality of the materialization required by the lowest level of context transition". OK. If you say so.

View solution in original post

7 REPLIES 7
DiscreetRaven
Frequent Visitor

Good news and bad news. Here's a greatly improved (faster) method for (not) accomplishing this goal. No 'CALCULATE' expressions are used. Hooray. I want to recognize Reza Rad from RADACAD.COM for his ideas for 'last status to-date'. This technique "works" just as well as the prior method, but. . . 

GoodDollars2 = 
-- CHOOSE Post Date
VAR P =
ALLSELECTED ( 'Calendar'[Date] )
-- FILTER Table for Only Those Rows Prior to or Equal to the Chosen Post Date
VAR F =
FILTER ( Table1, Table1[Post] <= P )
-- DETERMINE the Most Recent Event within the Filtered Table
VAR E =
MAXX ( F, Table1[Event] )
-- DETERMINE Status in the Row of the Most Recent Event within the Filtered Table
VAR S =
LOOKUPVALUE ( Table1[Status], Table1[Event], E )
-- DETERMINE Dollars in the Row of the Most Recent Event within the Filtered Table
VAR D =
LOOKUPVALUE ( Table1[Dollars], Table1[Event], E )
-- CHOOSE Dollars from the Row of the Most Recent Event Row ONLY IF the Status is Acceptable
RETURN
IF ( S IN { "good", "better" }, D )
. . . it breaks down for a more realistic data set, below. Same as earlier, but with non-unique [Event] numbers between each [Contract]. A more efficient failure is still a failure. If it weren't for the non-unique [Event] numbers, this DAX code would be a much improved approach. Wishing isn't working. So close, yet . . .
 
Does anyone see the flaw in this logic? What am I missing?
 
ContractPostEventDollarsStatus
666666Friday, January 1, 20215$5,000good
666666Monday, February 1, 20217$4,750good
666666Monday, March 1, 20219$4,500better
666666Thursday, April 1, 202111$4,250bad
666666Saturday, May 1, 202113$4,000worse
666666Tuesday, June 1, 202114$3,750good
666666Thursday, July 1, 202117$3,500better
777777Wednesday, November 11, 20202$12,000good
777777Saturday, December 12, 20203$11,000better
777777Saturday, December 12, 20206$10,000better
777777Saturday, December 12, 20208$9,000bad
777777Wednesday, January 13, 202112$8,000better
777777Sunday, February 14, 202123$7,000bad
777777Monday, March 15, 202129$6,000worse
888888Sunday, May 5, 201915$27.5good
888888Thursday, June 6, 201921$27good
888888Sunday, July 7, 201922$26good
888888Friday, April 24, 202024$25bad
888888Friday, April 24, 202025$0good
888888Wednesday, March 31, 202126$0good
888888Monday, April 12, 202127$0good
999999Wednesday, November 4, 20201$999good
999999Friday, November 20, 20202$888good
999999Friday, December 4, 20203$777good
999999Sunday, December 20, 20204$666good
999999Monday, January 4, 20215$555good
999999Wednesday, January 20, 20216$444good
999999Thursday, February 4, 20217$333good
999999Saturday, February 20, 20218$222good
999999Thursday, March 4, 20219$111good
999999Saturday, March 20, 202120$99good
999999Sunday, April 4, 202121$88good
999999Tuesday, April 20, 202122$77good
999999Tuesday, May 4, 202123$66good
999999Thursday, May 20, 202124$55good
999999Friday, June 4, 202125$44good
999999Sunday, June 20, 202126$33good
999999Sunday, July 4, 202127$22good
999999Tuesday, July 20, 202128$11good
999999Wednesday, August 4, 202129$0good
v-shex-msft
Community Support
Community Support

HI @DiscreetRaven,

I'd like to suggest you take a look at the following blog about optimization nested expression performance if they suitable for your requirement:

Optimizing nested iterators in DAX - SQLBI

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Hello, Xiaoxin. Thanks for your response. I love those Italian guys. I've learned a lot from them, but obviously not enough. I do not understand what is meant by "the cardinality of the materialization required by the lowest level of context transition". I'll study their article and will repost here, if it results in an improvement.

lbendlin
Super User
Super User

This is not optimized but it seems to provide what you need. Could sprinkle in some table variables to reduce the cardinality.

 

lbendlin_0-1630890295796.png

 

Thanks to you, @lbendlin, for taking the time. Your suggestion inspired me to look at the problem from another angle. I love the simplified approach but had to include a reference to [Event] to make it work. (I don't want the MAX [Status] prior to the selected date. I want the [Status] from the row with the MAX [Event] prior to the selected date.)

GoodDollars =
-- CHOOSE Post Date
VAR P =
ALLSELECTED ( 'Calendar'[Date] )
-- DETERMINE the Single Most Recent Event Row for a Contract Prior or Equal to the Chosen Post Date
VAR E =
CALCULATE ( MAX ( Table1[Event] ), ALLEXCEPT ( Table1, Table1[Contract] ), Table1[Post] <= P )
-- DETERMINE Status in the Row of the Most Recent Event (within All Rows for a Contract)
VAR S =
CALCULATE ( MAX ( Table1[Status] ), Table1[Event] = E )
-- DETERMINE Dollars in the Row of the Most Recent Event
VAR D =
CALCULATE ( SUM ( Table1[Dollars] ), Table1[Event] = E )
-- CHOOSE Dollars from the Row of the Most Recent Event ONLY IF the Status is Acceptable
RETURN
IF ( S IN { "good", "better" }, D )

The result:

DiscreetRaven_2-1631404706760.png

This solution "works", but (as you noted) is not optimized. It's just fine for a sample table of two dozen records. Ten million plus - not so much. My first attempt was intended to create a much smaller 'temporary' table filtered to only those records remaining to be tested for their [Status]. 'MaxEvent' does just that. I want to believe I can directly reference the columns of ‘MaxEvent’ without having to recalculate the table a second time to determine the [Status], and a third time to determine [Dollars]. If that's possible, it's got to be more efficient. As the SQLBI article @v-shex-msft pointed out says, the problem remains "the cardinality of the materialization required by the lowest level of context transition". OK. If you say so.

As you say, you may want to use table variables to narrow down the subset of rows that you then need to apply logic to.  And this is where the cardinality kicks in - you want to start with the filter that produces the smallest result set.

Yes. That's a must. As it stands, it's useless. It takes MINUTES to respond when run against my 10M+ production dataset. The initial filter ('MaxEvent' from the first example, and 'E' from the second example) quickly cuts the result set down to 10%. The outermost steps are the hogs.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.