cancel
Showing results for
Did you mean:
Highlighted
raykim Frequent Visitor

## Calculating max information gain over time (List of values as variable)

I have customer churn dataset with the following columns: cutomerID, timestamp, churn flag, and spending.

What I hope to acheive is to calculate max information gain over time.

In order to do so, the following steps are required

1. Define time bins (i.e. 20 days)

2. Define the list of cutoff points (i.e. {\$20, \$21, \$22, \$23 ...})

3. Find and count the number of churners and survivors in each time bin

4. Find and count the numer of chuners and survivors with spending more than cutoff points in each time bin

5. Find and count the numer of chuners and survivors with spending less than cutoff points in each time bin

6. Find the max information gain among the information values from each cutoff points

And the problem I am facing is that I cannot iterate through multiple cutoff points and test out which gives the highest information gain value.

I can calculate the information gain for a sing cutoff value and the code is as below.

```max_RIG =

VAR cutoff_point = 30

// prior //
VAR total =
countrows(data)

VAR prior_churned =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[churn] = TRUE()))

VAR prior_survived =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[churn] = FALSE()))

// feature is True //

VAR feature_true =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]),
filter(data, data[spending] >= cutoff_point))

VAR feature_true_churned =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]),
filter(data, data[spending] >= cutoff_point), filter(data, data[churn] = TRUE()))

VAR feature_true_survived =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]),
filter(data, data[spending] >= cutoff_point), filter(data, data[churn] = FALSE()))

// feature is false//

VAR feature_false =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]),
filter(data, data[spending] < cutoff_point))

VAR feature_false_churned =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]),
filter(data, data[spending] < cutoff_point), filter(data, data[churn] = TRUE()))

VAR feature_false_survived =
calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]),
filter(data, data[spending] < cutoff_point), filter(data, data[churn] = FALSE()))

// RIG //
VAR prior_gain =
prior_churned / total * log(prior_churned / total, 2) + prior_survived / total * log(prior_survived / total, 2)

VAR feature_true_gain =
feature_true/total * (feature_true_churned/feature_true * log(feature_true_churned /  feature_true, 2) + feature_true_survived / feature_true * log(feature_true_survived/feature_true,2))

VAR feature_false_gain =
feature_false/ total * (feature_false_churned/feature_false * log(feature_false_churned/feature_false, 2) + feature_false_survived / feature_false * log(feature_false_survived / feature_false, 2))

RETURN
(prior_gain - (feature_true_gain + feature_false_gain)) / prior_gain```

I treid to replace VAR cutoff_point = 30 to

`VAR cutoff_point = GENERATESERIES(MIN(data[spending], MAX(data[spending]),1)`

and apply max function in the last line

```RETURN
MAX(prior_gain - (feature_true_gain + feature_false_gain)) / prior_gain)```

But it doesn't work because MAX() function only takes column as a parameter.

Any idea on how to implement this on Power BI?

1 ACCEPTED SOLUTION

Accepted Solutions Community Support Team

## Re: Calculating max information gain over time (List of values as variable)

Hi @raykim ,

```Measure =
VAR a =
UNION (
ROW ( "value", 0.23 ),
ROW ( "value", 0.51 ),
ROW ( "value", 0.18 ),
ROW ( "value", 0.79 )
)
RETURN
MAXX ( a, [value] )
``` Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
3 REPLIES 3 Community Support Team

## Re: Calculating max information gain over time (List of values as variable)

Hi @raykim ,

MAXX() should work in your scenario.

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
raykim Frequent Visitor

## Re: Calculating max information gain over time (List of values as variable)

MAXX would not work because it takes table reference as a parameter.

I would like to calculate the max value of array.

For instance, max of {0.23, 0.51, 0.18, 0.79} Community Support Team

## Re: Calculating max information gain over time (List of values as variable)

Hi @raykim ,

```Measure =
VAR a =
UNION (
ROW ( "value", 0.23 ),
ROW ( "value", 0.51 ),
ROW ( "value", 0.18 ),
ROW ( "value", 0.79 )
)
RETURN
MAXX ( a, [value] )
``` Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Announcements #### Exclusive LIVE Community Event No. 2 – Ask Amir Anything

Next in our Triple A series: Ask Amir Netz questions about the latest updates, features and future. #### October 2019 Community Highlights

October was a busy month in the community. Read the recap article to learn about some of the events and content.  #### Analytics in Azure virtual event

Experience a limitless analytics service built to ingest, prep, manage, and serve data for immediate use in Power BI. Top Kudoed Authors
Users Online
Currently online: 64 members 1,197 guests
Recent signins:
• mrstillchicken • PaulDBrown • • vdburg • Snowy34 • SebaG • Arif_zai • SabineBurkhardt • AlexDH • mp123 • BondethKhon • AFAmores • HlaSoe • rmallol 