Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
This [TABLE1] is a simplified representation of my data. The TRUE data is 40 columns of 10M records.
Contract | Post | Event | Dollars | Status |
666666 | 1/1/2021 | 5 | $5,000 | good |
666666 | 2/1/2021 | 7 | $4,750 | good |
666666 | 3/1/2021 | 9 | $4,500 | better |
666666 | 4/1/2021 | 11 | $4,250 | bad |
666666 | 5/1/2021 | 13 | $4,000 | worse |
666666 | 6/1/2021 | 14 | $3,750 | good |
666666 | 7/1/2021 | 17 | $3,500 | better |
777777 | 11/11/2020 | 2 | $12,000 | good |
777777 | 12/12/2020 | 3 | $11,000 | better |
777777 | 12/12/2020 | 6 | $10,000 | better |
777777 | 12/12/2020 | 8 | $9,000 | bad |
777777 | 1/13/2021 | 12 | $8,000 | better |
777777 | 2/14/2021 | 23 | $7,000 | bad |
777777 | 3/15/2021 | 29 | $6,000 | worse |
888888 | 5/5/2019 | 15 | $28 | good |
888888 | 6/6/2019 | 21 | $27 | good |
888888 | 7/7/2019 | 22 | $26 | good |
888888 | 4/24/2020 | 24 | $25 | bad |
888888 | 4/24/2020 | 25 | $0 | good |
888888 | 3/31/2021 | 26 | $0 | good |
888888 | 4/12/2021 | 27 | $0 | good |
THE OBJECTIVE is to get the [Dollars] in the MAX [Event] record within each [Contract] group where the [Post] date is less than or equal to a user-supplied date. But, only if the MAX [Event] record’s [Status] is ‘good’ or ‘better’. (Determination of the MAX [Event] (prior to the specified [Post] date) takes precedence over the [Status].)
My first inclination is to slice out those records that don’t meet the date criteria, then to find the maximum [Event] for each remaining [Contract] group.
DEFINE
VAR AsOfThisDate =
DATE ( "2021", "03", "24" )
VAR BeforeDate =
SUMMARIZECOLUMNS (
Table1[Contract],
Table1[Event],
FILTER ( Table1, Table1[Post] <= AsOfThisDate )
)
VAR MaxEvent =
GROUPBY (
BeforeDate,
Table1[Contract],
"@Event", MAXX ( CURRENTGROUP (), Table1[Event] )
)
EVALUATE
MaxEvent
This results in the ONE MAX [Event] record per [Contract] where the [Post] date is less than or equal to the supplied date. Great! Now that I’ve cut the table size down to a much smaller subset, the remaining step is to retrieve the [Dollars] value from each of these records where the [Status] is “good” or “better”.
This is where I’m stuck. I suspect I’m attacking this problem in a sub-optimal, certainly less-than-ideal manner. I’ve not made any progress in my attempts at using CALCULATE. I thought something like this might work, but the ‘MaxEvent’ GROUPBY table doesn’t include [Dollars] or [Status] values, of course.
VAR GoodDollars =
CALCULATE (
MIN ( MaxEvent[Dollars] ),
MaxEvent[Status] = "good" || MaxEvent[Status] = "better",
)
EVALUATE
GoodDollars
How about this, instead? No such luck. It makes sense to me, but not to DAX Studio:
VAR GoodDollars =
CALCULATE (
SUM ( Table1[Dollars] ),
FILTER ( Table1, MaxEvent[Contract] = Table1[Contract] ),
FILTER ( Table1, MaxEvent[@Event] = Table1[Event] ),
FILTER ( Table1, Table1[Status] = "good" || Table1[Status] = "better" )
)
EVALUATE
GoodDollars
I don’t think I need to apply the [Status] filter to the entire table; only to the individual MAX [Event] record for each [Contract]. Am I making this more complicated than it needs to be? Probably. I’m not interested in aggregate values, either. I want to single out each [Contract]’s [Dollars] value for use in other measures.
Can you help me solve this puzzle, please? And, how to optimize the solution, given the data volume?
Solved! Go to Solution.
Thanks to you, @lbendlin, for taking the time. Your suggestion inspired me to look at the problem from another angle. I love the simplified approach but had to include a reference to [Event] to make it work. (I don't want the MAX [Status] prior to the selected date. I want the [Status] from the row with the MAX [Event] prior to the selected date.)
GoodDollars =
-- CHOOSE Post Date
VAR P =
ALLSELECTED ( 'Calendar'[Date] )
-- DETERMINE the Single Most Recent Event Row for a Contract Prior or Equal to the Chosen Post Date
VAR E =
CALCULATE ( MAX ( Table1[Event] ), ALLEXCEPT ( Table1, Table1[Contract] ), Table1[Post] <= P )
-- DETERMINE Status in the Row of the Most Recent Event (within All Rows for a Contract)
VAR S =
CALCULATE ( MAX ( Table1[Status] ), Table1[Event] = E )
-- DETERMINE Dollars in the Row of the Most Recent Event
VAR D =
CALCULATE ( SUM ( Table1[Dollars] ), Table1[Event] = E )
-- CHOOSE Dollars from the Row of the Most Recent Event ONLY IF the Status is Acceptable
RETURN
IF ( S IN { "good", "better" }, D )
The result:
This solution "works", but (as you noted) is not optimized. It's just fine for a sample table of two dozen records. Ten million plus - not so much. My first attempt was intended to create a much smaller 'temporary' table filtered to only those records remaining to be tested for their [Status]. 'MaxEvent' does just that. I want to believe I can directly reference the columns of ‘MaxEvent’ without having to recalculate the table a second time to determine the [Status], and a third time to determine [Dollars]. If that's possible, it's got to be more efficient. As the SQLBI article @v-shex-msft pointed out says, the problem remains "the cardinality of the materialization required by the lowest level of context transition". OK. If you say so.
Good news and bad news. Here's a greatly improved (faster) method for (not) accomplishing this goal. No 'CALCULATE' expressions are used. Hooray. I want to recognize Reza Rad from RADACAD.COM for his ideas for 'last status to-date'. This technique "works" just as well as the prior method, but. . .
GoodDollars2 =
-- CHOOSE Post Date
VAR P =
ALLSELECTED ( 'Calendar'[Date] )
-- FILTER Table for Only Those Rows Prior to or Equal to the Chosen Post Date
VAR F =
FILTER ( Table1, Table1[Post] <= P )
-- DETERMINE the Most Recent Event within the Filtered Table
VAR E =
MAXX ( F, Table1[Event] )
-- DETERMINE Status in the Row of the Most Recent Event within the Filtered Table
VAR S =
LOOKUPVALUE ( Table1[Status], Table1[Event], E )
-- DETERMINE Dollars in the Row of the Most Recent Event within the Filtered Table
VAR D =
LOOKUPVALUE ( Table1[Dollars], Table1[Event], E )
-- CHOOSE Dollars from the Row of the Most Recent Event Row ONLY IF the Status is Acceptable
RETURN
IF ( S IN { "good", "better" }, D )
Contract | Post | Event | Dollars | Status |
666666 | Friday, January 1, 2021 | 5 | $5,000 | good |
666666 | Monday, February 1, 2021 | 7 | $4,750 | good |
666666 | Monday, March 1, 2021 | 9 | $4,500 | better |
666666 | Thursday, April 1, 2021 | 11 | $4,250 | bad |
666666 | Saturday, May 1, 2021 | 13 | $4,000 | worse |
666666 | Tuesday, June 1, 2021 | 14 | $3,750 | good |
666666 | Thursday, July 1, 2021 | 17 | $3,500 | better |
777777 | Wednesday, November 11, 2020 | 2 | $12,000 | good |
777777 | Saturday, December 12, 2020 | 3 | $11,000 | better |
777777 | Saturday, December 12, 2020 | 6 | $10,000 | better |
777777 | Saturday, December 12, 2020 | 8 | $9,000 | bad |
777777 | Wednesday, January 13, 2021 | 12 | $8,000 | better |
777777 | Sunday, February 14, 2021 | 23 | $7,000 | bad |
777777 | Monday, March 15, 2021 | 29 | $6,000 | worse |
888888 | Sunday, May 5, 2019 | 15 | $27.5 | good |
888888 | Thursday, June 6, 2019 | 21 | $27 | good |
888888 | Sunday, July 7, 2019 | 22 | $26 | good |
888888 | Friday, April 24, 2020 | 24 | $25 | bad |
888888 | Friday, April 24, 2020 | 25 | $0 | good |
888888 | Wednesday, March 31, 2021 | 26 | $0 | good |
888888 | Monday, April 12, 2021 | 27 | $0 | good |
999999 | Wednesday, November 4, 2020 | 1 | $999 | good |
999999 | Friday, November 20, 2020 | 2 | $888 | good |
999999 | Friday, December 4, 2020 | 3 | $777 | good |
999999 | Sunday, December 20, 2020 | 4 | $666 | good |
999999 | Monday, January 4, 2021 | 5 | $555 | good |
999999 | Wednesday, January 20, 2021 | 6 | $444 | good |
999999 | Thursday, February 4, 2021 | 7 | $333 | good |
999999 | Saturday, February 20, 2021 | 8 | $222 | good |
999999 | Thursday, March 4, 2021 | 9 | $111 | good |
999999 | Saturday, March 20, 2021 | 20 | $99 | good |
999999 | Sunday, April 4, 2021 | 21 | $88 | good |
999999 | Tuesday, April 20, 2021 | 22 | $77 | good |
999999 | Tuesday, May 4, 2021 | 23 | $66 | good |
999999 | Thursday, May 20, 2021 | 24 | $55 | good |
999999 | Friday, June 4, 2021 | 25 | $44 | good |
999999 | Sunday, June 20, 2021 | 26 | $33 | good |
999999 | Sunday, July 4, 2021 | 27 | $22 | good |
999999 | Tuesday, July 20, 2021 | 28 | $11 | good |
999999 | Wednesday, August 4, 2021 | 29 | $0 | good |
HI @DiscreetRaven,
I'd like to suggest you take a look at the following blog about optimization nested expression performance if they suitable for your requirement:
Optimizing nested iterators in DAX - SQLBI
Regards,
Xiaoxin Sheng
Hello, Xiaoxin. Thanks for your response. I love those Italian guys. I've learned a lot from them, but obviously not enough. I do not understand what is meant by "the cardinality of the materialization required by the lowest level of context transition". I'll study their article and will repost here, if it results in an improvement.
This is not optimized but it seems to provide what you need. Could sprinkle in some table variables to reduce the cardinality.
Thanks to you, @lbendlin, for taking the time. Your suggestion inspired me to look at the problem from another angle. I love the simplified approach but had to include a reference to [Event] to make it work. (I don't want the MAX [Status] prior to the selected date. I want the [Status] from the row with the MAX [Event] prior to the selected date.)
GoodDollars =
-- CHOOSE Post Date
VAR P =
ALLSELECTED ( 'Calendar'[Date] )
-- DETERMINE the Single Most Recent Event Row for a Contract Prior or Equal to the Chosen Post Date
VAR E =
CALCULATE ( MAX ( Table1[Event] ), ALLEXCEPT ( Table1, Table1[Contract] ), Table1[Post] <= P )
-- DETERMINE Status in the Row of the Most Recent Event (within All Rows for a Contract)
VAR S =
CALCULATE ( MAX ( Table1[Status] ), Table1[Event] = E )
-- DETERMINE Dollars in the Row of the Most Recent Event
VAR D =
CALCULATE ( SUM ( Table1[Dollars] ), Table1[Event] = E )
-- CHOOSE Dollars from the Row of the Most Recent Event ONLY IF the Status is Acceptable
RETURN
IF ( S IN { "good", "better" }, D )
The result:
This solution "works", but (as you noted) is not optimized. It's just fine for a sample table of two dozen records. Ten million plus - not so much. My first attempt was intended to create a much smaller 'temporary' table filtered to only those records remaining to be tested for their [Status]. 'MaxEvent' does just that. I want to believe I can directly reference the columns of ‘MaxEvent’ without having to recalculate the table a second time to determine the [Status], and a third time to determine [Dollars]. If that's possible, it's got to be more efficient. As the SQLBI article @v-shex-msft pointed out says, the problem remains "the cardinality of the materialization required by the lowest level of context transition". OK. If you say so.
As you say, you may want to use table variables to narrow down the subset of rows that you then need to apply logic to. And this is where the cardinality kicks in - you want to start with the filter that produces the smallest result set.
Yes. That's a must. As it stands, it's useless. It takes MINUTES to respond when run against my 10M+ production dataset. The initial filter ('MaxEvent' from the first example, and 'E' from the second example) quickly cuts the result set down to 10%. The outermost steps are the hogs.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
15 | |
1 | |
1 | |
1 | |
1 |
User | Count |
---|---|
26 | |
3 | |
2 | |
2 | |
2 |