cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
raykim Frequent Visitor
Frequent Visitor

Calculating max information gain over time (List of values as variable)

I have customer churn dataset with the following columns: cutomerID, timestamp, churn flag, and spending.

What I hope to acheive is to calculate max information gain over time.

 

In order to do so, the following steps are required

1. Define time bins (i.e. 20 days)

2. Define the list of cutoff points (i.e. {$20, $21, $22, $23 ...})

3. Find and count the number of churners and survivors in each time bin

4. Find and count the numer of chuners and survivors with spending more than cutoff points in each time bin 

5. Find and count the numer of chuners and survivors with spending less than cutoff points in each time bin 

6. Find the max information gain among the information values from each cutoff points

 

And the problem I am facing is that I cannot iterate through multiple cutoff points and test out which gives the highest information gain value. 

I can calculate the information gain for a sing cutoff value and the code is as below.

 

max_RIG = 

VAR cutoff_point = 30

// prior //
VAR total =
countrows(data)

VAR prior_churned =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[churn] = TRUE()))

VAR prior_survived = 
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[churn] = FALSE()))


// feature is True //

VAR feature_true =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), 
         filter(data, data[spending] >= cutoff_point))

VAR feature_true_churned = 
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), 
         filter(data, data[spending] >= cutoff_point), filter(data, data[churn] = TRUE()))

VAR feature_true_survived = 
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), 
         filter(data, data[spending] >= cutoff_point), filter(data, data[churn] = FALSE()))


// feature is false//

VAR feature_false =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), 
         filter(data, data[spending] < cutoff_point))

VAR feature_false_churned = 
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), 
         filter(data, data[spending] < cutoff_point), filter(data, data[churn] = TRUE()))

VAR feature_false_survived = 
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), 
         filter(data, data[spending] < cutoff_point), filter(data, data[churn] = FALSE()))


// RIG //
VAR prior_gain =
prior_churned / total * log(prior_churned / total, 2) + prior_survived / total * log(prior_survived / total, 2)

VAR feature_true_gain =
feature_true/total * (feature_true_churned/feature_true * log(feature_true_churned /  feature_true, 2) + feature_true_survived / feature_true * log(feature_true_survived/feature_true,2))

VAR feature_false_gain =
feature_false/ total * (feature_false_churned/feature_false * log(feature_false_churned/feature_false, 2) + feature_false_survived / feature_false * log(feature_false_survived / feature_false, 2))

RETURN
(prior_gain - (feature_true_gain + feature_false_gain)) / prior_gain

I treid to replace VAR cutoff_point = 30 to

VAR cutoff_point = GENERATESERIES(MIN(data[spending], MAX(data[spending]),1)

and apply max function in the last line

RETURN
MAX(prior_gain - (feature_true_gain + feature_false_gain)) / prior_gain)

But it doesn't work because MAX() function only takes column as a parameter.

Any idea on how to implement this on Power BI?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Community Support Team
Community Support Team

Re: Calculating max information gain over time (List of values as variable)

Hi @raykim ,

 

Have a try please.

 

Measure = 
VAR a =
    UNION (
        ROW ( "value", 0.23 ),
        ROW ( "value", 0.51 ),
        ROW ( "value", 0.18 ),
        ROW ( "value", 0.79 )
    )
RETURN
    MAXX ( a, [value] )

Capture.PNG

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

3 REPLIES 3
Community Support Team
Community Support Team

Re: Calculating max information gain over time (List of values as variable)

Hi @raykim ,

 

MAXX() should work in your scenario.

 

 

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
raykim Frequent Visitor
Frequent Visitor

Re: Calculating max information gain over time (List of values as variable)

MAXX would not work because it takes table reference as a parameter.

I would like to calculate the max value of array.

For instance, max of {0.23, 0.51, 0.18, 0.79}

Community Support Team
Community Support Team

Re: Calculating max information gain over time (List of values as variable)

Hi @raykim ,

 

Have a try please.

 

Measure = 
VAR a =
    UNION (
        ROW ( "value", 0.23 ),
        ROW ( "value", 0.51 ),
        ROW ( "value", 0.18 ),
        ROW ( "value", 0.79 )
    )
RETURN
    MAXX ( a, [value] )

Capture.PNG

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Helpful resources

Announcements
Ask Amir Anything

Exclusive LIVE Community Event No. 2 – Ask Amir Anything

Next in our Triple A series: Ask Amir Netz questions about the latest updates, features and future.

October 2019 Community Highlights

October 2019 Community Highlights

October was a busy month in the community. Read the recap article to learn about some of the events and content.

New Solution Badges

New Solution Badges

Two waves of brand new solution badges are coming! Read the article for more information on our new community badges.

Analytics in Azure virtual event

Analytics in Azure virtual event

Experience a limitless analytics service built to ingest, prep, manage, and serve data for immediate use in Power BI.

Users Online
Currently online: 64 members 1,197 guests
Please welcome our newest community members: