Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi. While I've mastered most of the basics in DAX and M Query for churn, there's a scenario that I find quite interesting that I wanted to see if people had feedback for.
Consider a churn analysis for a typical large (30m monthly unique visitor) publisher's email subscriber list - snapshot below:
Here's what these columns represent and why they should be all that's needed to determine churn:
1. Profile ID represents a unique subscriber (in this case it's just a hashed email address). For DAX churn calcs, we'll basically be counting these under various contexts.
2. Profile Created Date in this instance determines cohort since this is the date someone joins. What I'm going to suggest below is a churn metric that uses a median of the churn rate across certain cohorts.
3. Engagement status. The most basic level of churn happens when someone goes (i) "hardbounce" or (ii) "optout". I call this "Hard" Churn. The Latest Interaction date captures the date of this event.
4. (and here's the hardest part) Latest Interaction. Most subscribers in these email lists never hardbounce or optout and yet need to be counted as what I call "Soft" Churn because they disengage to the point where don't visit and need to be counted as churn.
Accordingly, does anyone have any suggestions on how I'd write a DAX query to get the following:
A. Median 28 Day "Hard" Churn for the Prior 12 Months: taking into memory for each of the last 12 fully completed monthly cohorts (ie since it's now Jan 2019, we would look at the cohorts of users who signed up during 2018-01, 2018-02 .... 2018-12), calculating the percent of Profile ID's that "hard churn" -- defined as "hardbounce" and "optout" -- in each cohort, and from that creating a measure of the median of those 12 churn percentages.
(note I've done this -- harder part follows below)
B. Setting up the 28 day period as a variable or parameter so that we can look at churn over any number of days while automatically counting only the last 12 periods that are fully completed given that number of days. In other words, let's say we want to change 28 Day Hard Churn to 90 Day Hard Churn, the 12 cohorts analyzed need to be 90 days earlier (2017-09, 2017-10 .... 2018-09) because a cohort will only be measurable for churn purposes if its last day is older than the # of days history being looked at.
(and here's the hardest part)
C. Adding "Soft" Churn" to #1 and #2, above. "Soft" Churn is someone who should be counted as having churned because they havent visited the site in over X days (let's use 120 days for now - although ideally this is Parameter). This is the part that has made my brain hurt the most because the DAX needs (I think) to draw a variable based the difference between a fixed date that the numbers were drawn (ie 1/20/2019) and Latest Interaction to see if the Profile ID should be deemed churned due to inactivity. I have done this manually through tremendous amounts of calculated columns. Do any DAX geniuses know how to do this with measures from just the 4 primary columns, above?
Note: since anyone interested in this is probably interested in what we see as churn, where's what the numbers look like for us (based on my more manual and more M-Query calcs): see chart below: we typically see hard churn ~16% in the first month of any cohort, then an additional 6% in month 2, then about another 20% over the rest of the year for ~40% in the first year if we count 6 month inactives as having churned. 6 month inactives come back only about 5% of the time.
\
If I were super-skilled DAX, ideally I'd be able to show a stacked area chart for the median of any given set of cohorts where the Y axis is % churn, the X axis is time, and the stacked areas are "hard churn" and "soft churn" (judged by X day inactivity)
Solved! Go to Solution.
Hey Charlie, how about the approach below? I haven't tested it because your sample data has only one date for all entries but see if it helps, at least the logic.
What I've done is add an additional step to filter out the users earlier classified as soft churners that at some point had "optout" or "hardbounce" in Table1[Engagement] . That's what I've understood you need. Correct me if I'm mistaken. We go over the list and check for each ID whether its values in Table1[Engagement] (returned by the CALCTABLE) include "optout" or "hardbounce" (the intersect would be non-blank).
SoftChurnersCountExcludingHard = VAR _RefDate = DATE ( 2019, 01, 20 ) //This can be set via a slicer as well VAR _PeriodOfInactivity = 120 //Number of days. Change here or through a slicer VAR _CutOffDate = _RefDate - 120 VAR _SoftChurnersList = FILTER ( DISTINCT ( Table1[Profile ID] ), CALCULATE ( MAX ( Table1[Latest Interaction] ) ) < _CutOffDate ) VAR _SoftChurnersExcludingHard = FILTER ( _SoftChurnersList, COUNTROWS ( INTERSECT ( CALCULATETABLE ( DISTINCT ( Table1[Engagement] ) ), { "optout", "hardbounce" } ) ) > 0 )
RETURN COUNTROWS ( _SoftChurnersExcludingHard )
I've just taken the previous code as base and extended it so I am a bit worried about the performance of this approach. This might not be the most efficient but well, if your dataset is not very large it should be fine. Otherwise we could explore further.
I haven't checked the rest of the measure but _SoftChurnersCount doesn't return what you expect because _CountSubs_NonBounceNonOptout is a variable.
Variables in DAX are immutable. Once you assign a value to a variable at declaration, its contents won't change. Ever. It's like a constant, a behavior different from what you get in other programming languages.
So your
CALCULATE( _CountSubs_NonBounceNonOptout, ...NoMatterWhatComesHere...)
will always return _CountSubs_NonBounceNonOptout. It's like if you did
CALCULATE( 7, WhateverFiltersGoHere)
you'll always get 7.
_CountSubs_NonBounceNonOptout would need to be a measure for the pattern you are trying to apply to work, plus perhaps some minor modification to the code.
Cheers
So taking into account the comments on the previous posting, I believe your measure could be simplified a bit:
\SoftChurnersTRY3_V2 = CALCULATE( COUNTROWS( Table1 ) , FILTER (Table1, Table1[Latest Interaction] < EDATE(Table1[Profile Created Date],12) ), FILTER(Table1, Table1[Engagement] <> "optout" && Table1[Engagement] <> "hardbounce" ) )
where we make use of the beautiful row context available within FILTER( ) Note we're not using the two additional measures you created.
Or we could write the condition in the second FILTER( ) by using the IN operator (no conceptual change, just a different syntax to state the very same):
\SoftChurnersTRY3_V3 = CALCULATE( COUNTROWS( Table1 ) , FILTER (Table1, Table1[Latest Interaction] < EDATE(Table1[Profile Created Date],12) ), FILTER(Table1, NOT Table1[Engagement] IN { "optout" , "hardbounce"} ) )
There are still some additional (small) changes that could be done. And maybe some modification to increase performance, although that will depend to a substantial extent on the nature of your data.
By the way, I'm curious. Why do you start the names of your measures with a "\"? Is it to group them alphabetically so that they show all together?
Cheers
Hi @charleshale
Very interesting post It's always best to show your data in text/tabular form in addition to screen captures. It can then be readily copied and you make things easier for people trying to help.
I ain't positive I've understood correctly but let's take it one step at a time and have a go at point C, which you claim to be most interested in.
If you want to identify and count the users that have not been active in the last 120 days you could try the following. For every ID, we calculate the latest date it appears on the table and filter out those that last appeared within the last 120 days. The remaining ones are the soft churners.
SoftChurnersCount = VAR _RefDate = DATE ( 2019, 01, 20 ) //This can be set via a slicer as well VAR _PeriodOfInactivity = 120 //Number of days. Change here or through a slicer VAR _CutOffDate = _RefDate - 120 VAR _SoftChurnersList = FILTER ( DISTINCT ( Table1[Profile ID] ), CALCULATE ( MAX ( Table1[Latest Interaction] ) ) < _CutOffDate ) RETURN COUNTROWS ( _SoftChurnersList )
Does that make sense?
It is possible that some users that have not been active for the last 120 days did actually "hardbounce" or "optout" and thus are hard churners. If we want to consider that (I guess so) we'd need to do some changes. Probably using some of the stuff you already have.
@AlB- Thanks for the response. Your soft churner DAX is super elegant. That's the sort of DAX I'm trying to teach myself. How would you add an additional filter in the VAR = _SoftChurnersList to exclude in the count Engagement = optout || hardbounce because I believe those would otherwise be double counted. Thank you. Charlie
Profile Id | Profile Created Date | Engagement | Latest Interaction |
5c1c1022a41e4f0a36056188 | 12/20/2018 16:56 | engaged | 12/20/2018 17:39 |
5c1c0f3a52ba1e7aa107c89c | 12/20/2018 16:52 | passive | 12/20/2018 16:56 |
5c1c0ec3576f2c39ab2e610c | 12/20/2018 16:50 | passive | 12/20/2018 16:54 |
5c1c0e7c20122e74b53ca110 | 12/20/2018 16:49 | active | 12/20/2018 16:50 |
5c1c0cfe2a077c3d82788e50 | 12/20/2018 16:43 | passive | 12/20/2018 16:43 |
5c1c0c85a777396a5406e797 | 12/20/2018 16:41 | engaged | 12/20/2018 17:19 |
5c1c0c346c28c735f903386a | 12/20/2018 16:40 | passive | 12/20/2018 16:41 |
5c1c0b2a2a077c7e904107dd | 12/20/2018 16:35 | passive | 12/20/2018 16:37 |
5c1c09e47e553f20ad35a119 | 12/20/2018 16:30 | active | 12/20/2018 16:31 |
5c1c09827e553f22590fc49a | 12/20/2018 16:28 | passive | 12/20/2018 16:41 |
5c1c090e20122e6e8d2f5b35 | 12/20/2018 16:26 | passive | 12/20/2018 16:48 |
5c1c08aa67808976ec490fcd | 12/20/2018 16:24 | passive | 12/20/2018 16:38 |
5c1c081e20122e74b53c99f2 | 12/20/2018 16:22 | passive | 12/20/2018 16:38 |
5c1c07a620122e3a972edbdb | 12/20/2018 16:20 | passive | 12/20/2018 16:21 |
5c1c07376780896a331a025e | 12/20/2018 16:18 | engaged | 12/20/2018 16:24 |
5c1c06e16780896a331a0212 | 12/20/2018 16:17 | passive | 12/20/2018 16:32 |
5c1c0666283d8e6880185c4f | 12/20/2018 16:15 | passive | 12/20/2018 16:49 |
5c1c058a7e553f4b3b23495b | 12/20/2018 16:11 | engaged | 12/20/2018 16:13 |
5c1c05280564ce5ee53d022b | 12/20/2018 16:10 | optout | 12/20/2018 16:23 |
5c1c02d552ba1e050c3ac8cf | 12/20/2018 16:00 | passive | 12/20/2018 16:30 |
5c1c02ccbf0081454006b0a5 | 12/20/2018 15:59 | passive | 12/20/2018 16:50 |
5c1c0296576f2c4e7c69a812 | 12/20/2018 15:59 | passive | 12/20/2018 16:34 |
5c1c0233283d8e771d503da0 | 12/20/2018 15:57 | engaged | 12/20/2018 16:29 |
5c1c00890564ce2b4f11458d | 12/20/2018 15:50 | active | 12/20/2018 16:56 |
5c1c008640f866295f714824 | 12/20/2018 15:50 | passive | 12/20/2018 17:24 |
5c1c00237ace5a0c01162ced | 12/20/2018 15:48 | engaged | 12/20/2018 15:54 |
5c1bffdd67808925010f38a7 | 12/20/2018 15:47 | passive | 12/20/2018 17:24 |
5c1bff0e576f2c39ab2e5d8f | 12/20/2018 15:43 | engaged | 12/20/2018 17:00 |
5c1bfecbb90c2f5897579449 | 12/20/2018 15:42 | passive | 12/20/2018 15:48 |
5c1bfe7944d51f5d6943c081 | 12/20/2018 15:41 | engaged | 12/20/2018 16:31 |
5c1bfe757ace5a45c90c554e | 12/20/2018 15:41 | passive | 12/20/2018 16:24 |
5c1bfcbea77739428b6770cf | 12/20/2018 15:34 | active | 12/20/2018 16:33 |
5c1bfbf26c28c70b0c17f267 | 12/20/2018 15:30 | passive | 12/20/2018 15:41 |
5c1bfbcc52ba1e050c3ac2a9 | 12/20/2018 15:30 | passive | 12/20/2018 17:01 |
5c1bfb7d954fcf7e31699764 | 12/20/2018 15:28 | passive | 12/20/2018 15:43 |
Hey Charlie, how about the approach below? I haven't tested it because your sample data has only one date for all entries but see if it helps, at least the logic.
What I've done is add an additional step to filter out the users earlier classified as soft churners that at some point had "optout" or "hardbounce" in Table1[Engagement] . That's what I've understood you need. Correct me if I'm mistaken. We go over the list and check for each ID whether its values in Table1[Engagement] (returned by the CALCTABLE) include "optout" or "hardbounce" (the intersect would be non-blank).
SoftChurnersCountExcludingHard = VAR _RefDate = DATE ( 2019, 01, 20 ) //This can be set via a slicer as well VAR _PeriodOfInactivity = 120 //Number of days. Change here or through a slicer VAR _CutOffDate = _RefDate - 120 VAR _SoftChurnersList = FILTER ( DISTINCT ( Table1[Profile ID] ), CALCULATE ( MAX ( Table1[Latest Interaction] ) ) < _CutOffDate ) VAR _SoftChurnersExcludingHard = FILTER ( _SoftChurnersList, COUNTROWS ( INTERSECT ( CALCULATETABLE ( DISTINCT ( Table1[Engagement] ) ), { "optout", "hardbounce" } ) ) > 0 )
RETURN COUNTROWS ( _SoftChurnersExcludingHard )
I've just taken the previous code as base and extended it so I am a bit worried about the performance of this approach. This might not be the most efficient but well, if your dataset is not very large it should be fine. Otherwise we could explore further.
\SoftChurnersTRY2 = VAR _MeasureDate = DATEADD(Table1[Profile Created Date].[Date],1,YEAR) VAR _CountSubs_NonBounceNonOptout = CALCULATE( DISTINCTCOUNT(Table1[Profile Id]), Table1[Engagement] = "passive" || Table1[Engagement] = "active" || Table1[Engagement] = "engaged" || Table1[Engagement] = "disengaged" ) VAR _SoftChurnersCount = CALCULATE( _CountSubs_NonBounceNonOptout , Table1[Latest Interaction] < _MeasureDate ) RETURN _SoftChurnersCount
I haven't checked the rest of the measure but _SoftChurnersCount doesn't return what you expect because _CountSubs_NonBounceNonOptout is a variable.
Variables in DAX are immutable. Once you assign a value to a variable at declaration, its contents won't change. Ever. It's like a constant, a behavior different from what you get in other programming languages.
So your
CALCULATE( _CountSubs_NonBounceNonOptout, ...NoMatterWhatComesHere...)
will always return _CountSubs_NonBounceNonOptout. It's like if you did
CALCULATE( 7, WhateverFiltersGoHere)
you'll always get 7.
_CountSubs_NonBounceNonOptout would need to be a measure for the pattern you are trying to apply to work, plus perhaps some minor modification to the code.
Cheers
I see. Thank you. The following fixes that (note - I'm simplifying the problem down to 1 yr churn for the moment, which wouldnt work for any signs ups in the past year). Any idea why the following date calc fails to work correctly? I can provide more detail.
Seems like this should be easier to learn -- once upon a time, I ran modeling at an investment bank.....and these churn calcs seem way harder than arrays.
\SoftChurnersTRY3 = CALCULATE( COUNTROWS( Table1 ) , FILTER (Table1, Table1[Latest Interaction] < DATEADD(Table1[Profile Created Date].[Date],1,YEAR) ), FILTER(Table1, Table1[Engagement] <> "optout" || Table1[Engagement] <> "hardbounce" ) )
\exp_dateadd = CALCULATE( MAXX( Table1, DATEADD(Table1[Profile Created Date].[Date],1,YEAR)) )
Profile Id | Profile Created Date | Latest Interaction | Engagement |
588574aa99964aa82b8b465e | 1/22/2017 0:00 | 12/19/2018 0:00 | engaged |
58a8730e72fbf246388b52fe | 2/18/2017 0:00 | 6/16/2017 0:00 | optout |
59a6f24ab84a99f7048b4ca8 | 8/30/2017 0:00 | 11/4/2017 0:00 | optout |
59b743fbcbcf4894438b4643 | 9/11/2017 0:00 | 11/3/2018 0:00 | disengaged |
59f55c850aea1142238b57d1 | 10/29/2017 0:00 | 12/15/2018 0:00 | active |
5a380a5c99964ad07b8b5575 | 12/18/2017 0:00 | 12/21/2017 0:00 | optout |
5a935b0f566a94655d8b469d | 2/25/2018 0:00 | 3/26/2018 0:00 | optout |
5b12ff24fbd297519555bf91 | 6/2/2018 0:00 | 12/17/2018 0:00 | active |
5b289103954fcf13d35865e4 | 6/19/2018 0:00 | 12/19/2018 0:00 | engaged |
5b68d59acff06b4227388bd4 | 8/6/2018 0:00 | 12/12/2018 0:00 | passive |
5b7ecfbd2ddf9c19e646eb55 | 8/23/2018 0:00 | 12/20/2018 0:00 | active |
5b81573ca7773959a6383bec | 8/25/2018 0:00 | 12/18/2018 0:00 | optout |
5bb5508db90c2f4faf5e4539 | 10/3/2018 0:00 | 11/30/2018 0:00 | disengaged |
5bde70c41708b1620960f41f | 11/4/2018 0:00 | 12/20/2018 0:00 | engaged |
5be26b7352ba1e69ba563b8c | 11/6/2018 0:00 | 11/14/2018 0:00 | optout |
5be28d3f954fcf310f5fcdc6 | 11/7/2018 0:00 | 12/20/2018 0:00 | engaged |
5c0055270564ce4a2220bc93 | 11/29/2018 0:00 | 12/17/2018 0:00 | active
|
Profile Id | Profile Created Date | Latest Interaction | Engagement | \exp_dateadd |
5a380a5c99964ad07b8b5575 | 12/18/2017 0:00 | 12/21/2017 0:00 | optout | 12/18/2018 0:00 |
59f55c850aea1142238b57d1 | 10/29/2017 0:00 | 12/15/2018 0:00 | active | 10/29/2018 0:00 |
59b743fbcbcf4894438b4643 | 9/11/2017 0:00 | 11/3/2018 0:00 | disengaged | 9/11/2018 0:00 |
59a6f24ab84a99f7048b4ca8 | 8/30/2017 0:00 | 11/4/2017 0:00 | optout | 8/30/2018 0:00 |
58a8730e72fbf246388b52fe | 2/18/2017 0:00 | 6/16/2017 0:00 | optout | 2/18/2018 0:00 |
588574aa99964aa82b8b465e | 1/22/2017 0:00 | 12/19/2018 0:00 | engaged | 1/22/2018 0:00 |
So - I solved the problem. Public service announcement: DATEADD aint what it seems to be to my amateur eyes.
I used EDATE and it was processing intensive and cumbersome but it worked. Here were the steps:
1. Create a measure out of the Profiled Created Date column. Figuring this out hurt, but here's what worked:
\Profile Created Date = MAXX( KEEPFILTERS(VALUES('Table1'[Profile Id])), CALCULATE(MAX('Table1'[Profile Created Date]) ))
2. Using edate to make a measure of the churn measurement date (in this case 1 year out):
\_1YrChurnDate = EDATE([\Profile Created Date],12)
3. Writing the Soft Churn logic that started this whole adventure:
\SoftChurnersTRY3 = CALCULATE( COUNTROWS( Table1 ) , FILTER (Table1, Table1[Latest Interaction] < [\_1YrChurnDate] ), FILTER(Table1, Table1[Engagement] <> "optout" || Table1[Engagement] <> "hardbounce" ) )
While surely there's an easier way to calculate soft churn in a cohort, and one that has variables, AT LEAST THIS WORKS. It's a really important issue in email subscriber lists because hard churn (people that bounce and optout) is often ~20% compared to people that never do that but practically never show up again.
I'll keep refining the calcs and see where I can go....in the meantime, I'll leave this open because there have to be more elegant solutions than mine.
I find it really interesting there's so little on edate vs dateadd and how much better edate is for situations like churn -- because it probably means I'm still thinking of the problem through an excel lense, whereas there would be much easier with a native DAX perspective
Cool It's very interesting to see the number of things you've tried and the progress made.
There's quite a lot to be commented on (DATEADD, EDATE, the auto date/time capability...) but let's go step by step.
First of all, can you explain the new concept you are using to determine whether a user is a soft churner? Leaving the 'Engagement' value aside for now, you are searching for users that had their last interaction within 1 year of signing up. Why is this? Can you elaborate?
Now moving on to the code you show:
1. Why the KEEPFILTERS in:
\Profile Created Date = MAXX( KEEPFILTERS(VALUES('Table1'[Profile Id])), CALCULATE(MAX('Table1'[Profile Created Date]) ))
What are you trying to do? I don't think it has any effect in this case but I might be missing something. Have a look at this article.
2. Isn't [\Profile Created Date] just an intricate way of getting to the value of 'Table1'[Profile Id]? You would get the same result if you used directly good old 'Table1'[Profile Id] instead of the measure.
3. The condition on the second filter of
\SoftChurnersTRY3 = CALCULATE( COUNTROWS( Table1 ) , FILTER (Table1, Table1[Latest Interaction] < [\_1YrChurnDate] ), FILTER(Table1, Table1[Engagement] <> "optout" || Table1[Engagement] <> "hardbounce" ) )
should be
Table1[Engagement] <> "optout" && Table1[Engagement] <> "hardbounce"
Otherwise, with an OR as you had, every row in your sample data will evaluate to TRUE, since none has "hardbounce" in 'Engagement'
So taking into account the comments on the previous posting, I believe your measure could be simplified a bit:
\SoftChurnersTRY3_V2 = CALCULATE( COUNTROWS( Table1 ) , FILTER (Table1, Table1[Latest Interaction] < EDATE(Table1[Profile Created Date],12) ), FILTER(Table1, Table1[Engagement] <> "optout" && Table1[Engagement] <> "hardbounce" ) )
where we make use of the beautiful row context available within FILTER( ) Note we're not using the two additional measures you created.
Or we could write the condition in the second FILTER( ) by using the IN operator (no conceptual change, just a different syntax to state the very same):
\SoftChurnersTRY3_V3 = CALCULATE( COUNTROWS( Table1 ) , FILTER (Table1, Table1[Latest Interaction] < EDATE(Table1[Profile Created Date],12) ), FILTER(Table1, NOT Table1[Engagement] IN { "optout" , "hardbounce"} ) )
There are still some additional (small) changes that could be done. And maybe some modification to increase performance, although that will depend to a substantial extent on the nature of your data.
By the way, I'm curious. Why do you start the names of your measures with a "\"? Is it to group them alphabetically so that they show all together?
Cheers
@AIB - Good catch on the && vs II and simpler code. Thank you. The \ mark is just a mark I use for the ones I'm still trying to figure out.
Here's a small improvement I made:
\SoftChurnersTRY3_v2.5 = VAR _MosChurnPeriod = 12 RETURN CALCULATE( COUNTROWS( Table1 ) , FILTER (Table1, Table1[Latest Interaction] < EDATE(Table1[Profile Created Date],_MosChurnPeriod) ), FILTER(Table1, Table1[Engagement] <> "optout" && Table1[Engagement] <> "hardbounce" ) )
And here's one I tried that returned incorrect results in some periods, but correct in others -- probably that immutable issue again! (that derives from thinking of functions like indirect() in excel
\Bad_Softchurn = //delivering flawed results VAR _DateAsMeasure = MAXX( KEEPFILTERS(VALUES('Table1'[Profile Id])), CALCULATE(MAX('Table1'[Profile Created Date]) )) VAR _ChurnMosMeasured = 12 VAR _ChurnDate = EDATE(_DateAsMeasure,_ChurnMosMeasured) RETURN CALCULATE( COUNTROWS( Table1 ) , FILTER (Table1, Table1[Latest Interaction] < _ChurnDate ), FILTER(Table1, Table1[Engagement] <> "optout" && Table1[Engagement] <> "hardbounce" ) )
@AlBYou are a DAX genius. I've never event seen intersect in all the books I'm reading. Going to try it out. Will report back on this thread
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
96 | |
77 | |
63 | |
55 |
User | Count |
---|---|
143 | |
109 | |
89 | |
84 | |
66 |