Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have customer churn dataset with the following columns: cutomerID, timestamp, churn flag, and spending.
What I hope to acheive is to calculate max information gain over time.
In order to do so, the following steps are required
1. Define time bins (i.e. 20 days)
2. Define the list of cutoff points (i.e. {$20, $21, $22, $23 ...})
3. Find and count the number of churners and survivors in each time bin
4. Find and count the numer of chuners and survivors with spending more than cutoff points in each time bin
5. Find and count the numer of chuners and survivors with spending less than cutoff points in each time bin
6. Find the max information gain among the information values from each cutoff points
And the problem I am facing is that I cannot iterate through multiple cutoff points and test out which gives the highest information gain value.
I can calculate the information gain for a sing cutoff value and the code is as below.
max_RIG = VAR cutoff_point = 30 // prior // VAR total = countrows(data) VAR prior_churned = calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[churn] = TRUE())) VAR prior_survived = calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[churn] = FALSE())) // feature is True // VAR feature_true = calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[spending] >= cutoff_point)) VAR feature_true_churned = calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[spending] >= cutoff_point), filter(data, data[churn] = TRUE())) VAR feature_true_survived = calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[spending] >= cutoff_point), filter(data, data[churn] = FALSE())) // feature is false// VAR feature_false = calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[spending] < cutoff_point)) VAR feature_false_churned = calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[spending] < cutoff_point), filter(data, data[churn] = TRUE())) VAR feature_false_survived = calculate(countrows(data), filter(data, data[timestamp] >= RIG_calc[start_date]), filter(data, data[timestamp] < RIG_calc[end_date]), filter(data, data[spending] < cutoff_point), filter(data, data[churn] = FALSE())) // RIG // VAR prior_gain = prior_churned / total * log(prior_churned / total, 2) + prior_survived / total * log(prior_survived / total, 2) VAR feature_true_gain = feature_true/total * (feature_true_churned/feature_true * log(feature_true_churned / feature_true, 2) + feature_true_survived / feature_true * log(feature_true_survived/feature_true,2)) VAR feature_false_gain = feature_false/ total * (feature_false_churned/feature_false * log(feature_false_churned/feature_false, 2) + feature_false_survived / feature_false * log(feature_false_survived / feature_false, 2)) RETURN (prior_gain - (feature_true_gain + feature_false_gain)) / prior_gain
I treid to replace VAR cutoff_point = 30 to
VAR cutoff_point = GENERATESERIES(MIN(data[spending], MAX(data[spending]),1)
and apply max function in the last line
RETURN MAX(prior_gain - (feature_true_gain + feature_false_gain)) / prior_gain)
But it doesn't work because MAX() function only takes column as a parameter.
Any idea on how to implement this on Power BI?
Solved! Go to Solution.
Hi @raykim ,
Have a try please.
Measure = VAR a = UNION ( ROW ( "value", 0.23 ), ROW ( "value", 0.51 ), ROW ( "value", 0.18 ), ROW ( "value", 0.79 ) ) RETURN MAXX ( a, [value] )
MAXX would not work because it takes table reference as a parameter.
I would like to calculate the max value of array.
For instance, max of {0.23, 0.51, 0.18, 0.79}
Hi @raykim ,
Have a try please.
Measure = VAR a = UNION ( ROW ( "value", 0.23 ), ROW ( "value", 0.51 ), ROW ( "value", 0.18 ), ROW ( "value", 0.79 ) ) RETURN MAXX ( a, [value] )
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
111 | |
100 | |
80 | |
64 | |
58 |
User | Count |
---|---|
148 | |
111 | |
93 | |
84 | |
66 |