Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Greg_Deckler

Kaplan Meier Survival Curves with Power BI - Part 2

Introduction

Inspired by a post on the Power BI Community site, I read through this fine article by Chris Short on LinkedIn Pulse, Survival Curves in Tableau with HR Data. Overall, this technique looked really interesting and I wondered if I could do something similar in Power BI. Turns out that you can and this article will show you how.

 

Part 1 demonstrated how to build out a Kaplan Meier Survival Curve for some fictional HR Data. Part 2 continues this thread and shows how to create curves for individual departments. This article assumes you have been through Part 1.

 

Split It

The Tableau example goes on to split the data. Unfortunately, we cannot simply drag "Department" from our HRData table and end up with the data split out by department. But, this can be accomplished with a few tweaks to our approach.

 

Step 5: Add a Calculated Column to HRData

OK, so first, go back to the HRData table and add the following column:

 

DeptDays = CONCATENATE(HRData[Department],FORMAT(HRData[Days],"0000"))

DeptDays: Basically, where we previously built a relationship based on the Days column, now we need to build the relationship based upon Department and Days.

 

Step 6: Do Some More Data Modeling

OK, so now we need 3 tables, 2 intermediate tables and 1 final table. Create your first intermediate tables like so:

 

 

KMDeptHRDaysAll = GENERATESERIES(0,MAX(HRData[Days]),1)

Rename the Value column to Days. Add the following columns:

 

 

 

Department = "HR"
DeptDays = CONCATENATE("HR",FORMAT([Days],"0000"))

 

 

Create a second intermediate table like so:

 

KMDeptSalesDaysAll = GENERATESERIES(0,MAX(HRData[Days]),1)

 

 

Rename the Value column to Days. Add the following columns to it:

 

 

Department = "Sales"
DeptDays = CONCATENATE("Sales",FORMAT([Days],"0000"))

 

 

Then, create a third table like:

 

 

KMDeptDaysAll = UNION(KMDeptHRDaysAll,KMDeptSalesDaysAll)

 

 

Create a relationship between this third table and HRData using the DeptDays columns. Then, in this final table, create your calculated columns again using some slight variations:

 

 

Count = var records = CALCULATE(SUM(HRData[Count]),RELATEDTABLE(HRData)) RETURN IF(ISBLANK(records),0,records)

e_i = var records = CALCULATE(SUM(HRData[Event]),RELATEDTABLE(HRData)) RETURN IF(ISBLANK(records),0,records)
Running = SUMX(FILTER(ALL(KMDeptDaysAll),[DeptDays]<=EARLIER([DeptDays]) && [Department]=EARLIER([Department])),[Count])
d_i = [Count] + CALCULATE(SUM([Count]),ALL(KMDeptDaysAll),FILTER(KMDeptDaysAll,[Department]=EARLIER([Department]))) - [Running]
1-e_i/d_i = 1-[e_i]/[d_i]

And your measure:

 

MyKMDeptDaysAll = CALCULATE(PRODUCT('KMDeptDaysAll'[1-e_i/d_i]), FILTER(ALLSELECTED('KMDeptDaysAll'[Days]),ISONORAFTER('KMDeptDaysAll'[Days], MAX('KMDeptDaysAll'[Days]), DESC)))

 

Step 7: Build the Curve

Now, you can create a Line chart visual and from the KMDeptDaysAll table, put Days in your x-axis, the MyKMDeptDaysAll measure in your y-axis and Department in your Legend and with a little formatting you should get this:

 

 

km1.png

 

Conclusion

I have taken pains to provide an entirely DAX-based solution and created a number of extra columns that could have been combined into a single calculation for sake of clarity. With that being said, this particular problem is definitely a little more work in Power BI than in Tableau, but that's not necessarily unexpected. While Power BI and Tableau serve similar functions, the two tools operate and function differently and have their own strengths and weaknesses. Some things are trivial in Power BI and more difficult in Tableau and vice versa. However, with a little effort and this blog article as a guide, you can easily start using Kaplan Meier Survival Curves for your own purposes in Power BI!

Comments

Add tracking counter:

 

 

This is amazing, @Greg_Deckler !!! Thank you for going through each step so thoroughly. Will definitely be putting this to good use. I went ahead and built out the additional "department" tables and measures with my data, but I also tried just pulling over the 'department' into the legend from my overarching KM Curve and it automatically split up the curve into the 'departments'. Maybe this an update since then? But either way, this is awesome - thanks so much!

@Greg_Deckler, I just stumbled upon this post as I am working on something similar.

 

Very cool!

 

Would also be interesting to add two measures to display next to the plot: a dynamic calculation of the harzard ratio between the two curves and the log rank test with chi square and the p value.

 

Maybe you have a tip on a problem I am facing plotting two survival curves on the same graph: Combining two x-axes from same table to plot two m... - Microsoft Power BI Community

 

I think I need to create a new stacked table to do it properly, but maybe there is a simple trick I am unfamiliar with.