Kaplan Meier Survival Curves with Power BI - Part 2

by Super User on ‎01-05-2018 12:24 PM

Introduction

Inspired by a post on the Power BI Community site, I read through this fine article by Chris Short on LinkedIn Pulse, Survival Curves in Tableau with HR Data. Overall, this technique looked really interesting and I wondered if I could do something similar in Power BI. Turns out that you can and this article will show you how.

 

Part 1 demonstrated how to build out a Kaplan Meier Survival Curve for some fictional HR Data. Part 2 continues this thread and shows how to create curves for individual departments. This article assumes you have been through Part 1.

 

Split It

The Tableau example goes on to split the data. Unfortunately, we cannot simply drag "Department" from our HRData table and end up with the data split out by department. But, this can be accomplished with a few tweaks to our approach.

 

Step 5: Add a Calculated Column to HRData

OK, so first, go back to the HRData table and add the following column:

 

DeptDays = CONCATENATE(HRData[Department],FORMAT(HRData[Days],"0000"))

DeptDays: Basically, where we previously built a relationship based on the Days column, now we need to build the relationship based upon Department and Days.

 

Step 6: Do Some More Data Modeling

OK, so now we need 3 tables, 2 intermediate tables and 1 final table. Create your first intermediate tables like so:

 

 

KMDeptHRDaysAll = GENERATESERIES(0,MAX(HRData[Days]),1)

Rename the Value column to Days. Add the following columns:

 

 

 

Department = "HR"
DeptDays = CONCATENATE("HR",FORMAT([Days],"0000"))

 

 

Create a second intermediate table like so:

 

KMDeptSalesDaysAll = GENERATESERIES(0,MAX(HRData[Days]),1)

 

 

Rename the Value column to Days. Add the following columns to it:

 

 

Department = "Sales"
DeptDays = CONCATENATE("Sales",FORMAT([Days],"0000"))

 

 

Then, create a third table like:

 

 

KMDeptDaysAll = UNION(KMDeptHRDaysAll,KMDeptSalesDaysAll)

 

 

Create a relationship between this third table and HRData using the DeptDays columns. Then, in this final table, create your calculated columns again using some slight variations:

 

 

Count = var records = CALCULATE(SUM(HRData[Count]),RELATEDTABLE(HRData)) RETURN IF(ISBLANK(records),0,records)

e_i = var records = CALCULATE(SUM(HRData[Event]),RELATEDTABLE(HRData)) RETURN IF(ISBLANK(records),0,records)
Running = SUMX(FILTER(ALL(KMDeptDaysAll),[DeptDays]<=EARLIER([DeptDays]) && [Department]=EARLIER([Department])),[Count])
d_i = [Count] + CALCULATE(SUM([Count]),ALL(KMDeptDaysAll),FILTER(KMDeptDaysAll,[Department]=EARLIER([Department]))) - [Running]
1-e_i/d_i = 1-[e_i]/[d_i]

And your measure:

 

MyKMDeptDaysAll = CALCULATE(PRODUCT('KMDeptDaysAll'[1-e_i/d_i]), FILTER(ALLSELECTED('KMDeptDaysAll'[Days]),ISONORAFTER('KMDeptDaysAll'[Days], MAX('KMDeptDaysAll'[Days]), DESC)))

 

Step 7: Build the Curve

Now, you can create a Line chart visual and from the KMDeptDaysAll table, put Days in your x-axis, the MyKMDeptDaysAll measure in your y-axis and Department in your Legend and with a little formatting you should get this:

 

 

km1.png

 

Conclusion

I have taken pains to provide an entirely DAX-based solution and created a number of extra columns that could have been combined into a single calculation for sake of clarity. With that being said, this particular problem is definitely a little more work in Power BI than in Tableau, but that's not necessarily unexpected. While Power BI and Tableau serve similar functions, the two tools operate and function differently and have their own strengths and weaknesses. Some things are trivial in Power BI and more difficult in Tableau and vice versa. However, with a little effort and this blog article as a guide, you can easily start using Kaplan Meier Survival Curves for your own purposes in Power BI!

Comments
by Super User
on ‎01-08-2018 05:59 AM

Add tracking counter: