Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi all,
See the attached file for support.
We have a simple one-table model:
CustomerId | Dates | Segment |
6 | 03/01/2020 | Segment2 |
7 | 03/01/2020 | Segment2 |
6 | 03/02/2020 | Segment1 |
7 | 03/02/2020 | Segment2 |
6 | 03/03/2020 | Segment1 |
7 | 03/03/2020 | Segment1 |
with info on what segment customers belonged to at specific dates. We build a matrix visual with Segment in rows, Dates in columns and this measure in values:
Net new customers =
VAR _CurrentDate =
SELECTEDVALUE (Table1[Dates] )
VAR _PreviousDate =
CALCULATE (
MAX ( Table1[Dates] ),
ALLSELECTED ( Table1[Dates] ),
Table1[Dates] < _CurrentDate
)
VAR _CustomerIDsThisMonth = DISTINCT(Table1[CustomerId])
VAR _CustomerIDsPreviousMonth =
CALCULATETABLE (
DISTINCT(Table1[CustomerId]),
Table1[Dates] = _PreviousDate
)
RETURN
COUNTROWS(EXCEPT(_CustomerIDsThisMonth, _CustomerIDsPreviousMonth))
This provides the number of new customers that moved to that segment at that date (from the previous date).
This works fine for low number of customers. When that increases, however, it gets quite slow. For instance, with 1 million customers and info on 10 dates in the table, the measure takes around 13 secs to execute in the visual. If the number of customers and dates is higher, like in the real scenario, it is much slower still.
Any ideas on how to optimize this, make it faster?
Note that you can change the size of the table by changing the number of customers and number of periods (dates) in the query editor. Simply edit the parameters NumCustomers and NumPeriods.
@mahoneypat , @MFelix @TomMartens
Solved! Go to Solution.
Hi @AlB ,
try this.
Net new customers v2 =
VAR _CurrentDate =
SELECTEDVALUE (Table1[Dates] )
VAR _PreviousDate =
CALCULATE (
MAX ( Table1[Dates] ),
ALLSELECTED ( Table1[Dates] ),
Table1[Dates] < _CurrentDate
)
VAR _CountCustomerIDsThisMonth =
CALCULATE(
DISTINCTCOUNT(Table1[CustomerId]),
Table1[Dates] = _PreviousDate ||
Table1[Dates] = _CurrentDate
)
VAR _CountCustomerIDsPreviousMonth =
CALCULATE(
DISTINCTCOUNT(Table1[CustomerId]),
Table1[Dates] = _PreviousDate
)
RETURN
_CountCustomerIDsThisMonth - _CountCustomerIDsPreviousMonth
Thanks so much for your continued efforts. V5 shows a bit of an improvement on 1M customers, but not really significant.
- [ Net new customers purchases V5]: 21.5 secs
Perhaps we've reached the optimal approach without using PQ or changing the model?
Thanks
This is quite possible.
Thanks for V4. Smart alternative. It doesn't seem to run faster, I'm afraid. These are the readings in DAX Studio with 1M customers:
- [ Net new customers purchases]: 22.5 secs
- [ Net new customers purchases V2 ] : 25.7 secs
- [ Net new customers purchases V4 ]: 23.7 secs
So no massive differences. V4 actually introduces a Callback Data at the end but that doesn't seem to have a big effect. Earlier I claimed [ Net new customers purchases V2 ] was twice as fast as [ Net new customers purchases] but i cannot reproduce that now. Perhaps I made a mistake measuring, it was with the Performance analyzer with PBI. Dax Studio should be more reliable
Thanks
Hi @AlB ,
try this.
Net new customers purchases V5 =
VAR _CurrentDate =
SELECTEDVALUE ( Table1[Dates] )
VAR _PreviousDate =
CALCULATE ( MAX ( Table1[Dates] ), Table1[Dates] < _CurrentDate )
VAR _NewCustomer =
FILTER (
VALUES ( Table1[CustomerId] ),
VAR _CountPreviousCustomer =
CALCULATE ( COUNT ( Table1[CustomerId] ), Table1[Dates] = _PreviousDate )
VAR _IsInPreviousDate =
_CountPreviousCustomer = BLANK ()
RETURN
_IsInPreviousDate
)
RETURN
CALCULATE ( SUM ( Table1[Sales] ), _NewCustomer )
I'm primarily interested in a DAX solution but if it cannot be optimized further in DAX PQ is an option too
@mwegener (and everyone else)
Allow me to challenge you a bit further. See attached file for support.
We now want to calculate the revenue (sales) that the new customers that moved to the segment have generated. For that we have added an extra column (Table1[Sales]). We then have an initial approach, similar to [Net new customers]. Very slow as well:
Net new customers purchases =
VAR _CurrentDate = SELECTEDVALUE ( Table1[Dates] )
VAR _PreviousDate = CALCULATE ( MAX ( Table1[Dates] ), Table1[Dates] < _CurrentDate )
VAR _CustomerIDsThisMonth = DISTINCT ( Table1[CustomerId] )
VAR _CustomerIDsPreviousMonth =
CALCULATETABLE (
DISTINCT ( Table1[CustomerId] ),
Table1[Dates] = _PreviousDate
)
RETURN
CALCULATE (
SUM ( Table1[Sales] ),
EXCEPT ( _CustomerIDsThisMonth, _CustomerIDsPreviousMonth )
)
Following an approach similar to @mwegener 's V2 (although it cannot be followed completely in this case), we can make it twice as fast, but still relatively slow:
Net new customers purchases V2 =
VAR _CurrentDate = SELECTEDVALUE ( Table1[Dates] )
VAR _PreviousDate = CALCULATE ( MAX ( Table1[Dates] ), Table1[Dates] < _CurrentDate )
VAR _CustomerIDsThisMonth =
CALCULATETABLE (
DISTINCT ( Table1[CustomerId] ),
Table1[Dates] IN {_PreviousDate , _CurrentDate}
)
VAR _CustomerIDsPreviousMonth =
CALCULATETABLE (
DISTINCT ( Table1[CustomerId] ),
Table1[Dates] = _PreviousDate
)
RETURN
CALCULATE ( SUM ( Table1[Sales] ), _CustomerIDsThisMonth )
- CALCULATE ( SUM ( Table1[Sales] ), _CustomerIDsPreviousMonth )
Can you make it faster?
Many thanks
Hi @AlB ,
try this.
Net new customers purchases V4 =
VAR _CurrentDate =
SELECTEDVALUE ( Table1[Dates] )
VAR _PreviousDate =
CALCULATE ( MAX ( Table1[Dates] ), Table1[Dates] < _CurrentDate )
RETURN
SUMX (
Table1,
IF (
CALCULATE (
COUNTROWS ( FILTER ( Table1, Table1[Dates] = _PreviousDate ) ),
REMOVEFILTERS ( Table1[Dates] )
) >= 1,
BLANK (),
Table1[Sales]
)
)
Hi @mwegener
Many thanks for your responses. How didn't I think of this approach 🙄 ?? V2 is way way faster. Great!
Interestingly enough, though, V3 is slower than the original one 🤔 which I wouldn't expect seeing V2's performance
Hi @AlB ,
try this.
Net new customers v2 =
VAR _CurrentDate =
SELECTEDVALUE (Table1[Dates] )
VAR _PreviousDate =
CALCULATE (
MAX ( Table1[Dates] ),
ALLSELECTED ( Table1[Dates] ),
Table1[Dates] < _CurrentDate
)
VAR _CountCustomerIDsThisMonth =
CALCULATE(
DISTINCTCOUNT(Table1[CustomerId]),
Table1[Dates] = _PreviousDate ||
Table1[Dates] = _CurrentDate
)
VAR _CountCustomerIDsPreviousMonth =
CALCULATE(
DISTINCTCOUNT(Table1[CustomerId]),
Table1[Dates] = _PreviousDate
)
RETURN
_CountCustomerIDsThisMonth - _CountCustomerIDsPreviousMonth
HI @AlB ,
depending on the underlying data, this adjustment could also help.
Net new customers v3 =
VAR _CurrentDate =
SELECTEDVALUE (Table1[Dates] )
VAR _PreviousDate =
CALCULATE (
MAX ( Table1[Dates] ),
ALLSELECTED ( Table1[Dates] ),
Table1[Dates] < _CurrentDate
)
VAR _CountCustomerIDsThisMonth =
CALCULATE(
SUMX(VALUES(Table1[CustomerId]),1),
Table1[Dates] = _PreviousDate ||
Table1[Dates] = _CurrentDate
)
VAR _CountCustomerIDsPreviousMonth =
CALCULATE(
SUMX(VALUES(Table1[CustomerId]),1),
Table1[Dates] = _PreviousDate
)
RETURN
_CountCustomerIDsThisMonth - _CountCustomerIDsPreviousMonth
Got it from the Guy in a Cube video "Debugging a slow Power BI report with Phil Seamark"
https://www.youtube.com/watch?v=eABg872TAJU
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
97 | |
86 | |
70 | |
62 |
User | Count |
---|---|
151 | |
120 | |
103 | |
87 | |
68 |