Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Greg_Deckler
Super User
Super User

Kaplan Meier Survival Curves

@Buzzshot43 posted a question on KM Survival Curves here:

https://community.powerbi.com/t5/Desktop/KM-Survival-Curve-in-Power-BI/m-p/326809#M145836

 

The post refers to the following Tableau article:

https://www.linkedin.com/pulse/survival-curves-tableau-hr-data-chris-short/

 

So, figured I'd give it a shot and have gotten part of the way there but was hoping the community could help me finish this off.

 

I am using the dataset from the Tableau article referenced above and am trying to recreate this using pure Power BI (no R). I imported the data as HRData. The data looks like:

 

ID,Name,Hire Date,Termination Date,Status,Department

1,Inez Laurie,4/23/2010,9/2/2015,Inactive,HR

2,Yasuko Scruton,1/20/2010,,Active,Sales

...

 

So, in the dataset, I created the following custom columns:

Years = IF([Status]="Active",DATEDIFF(HRData[Hire Date],TODAY(),YEAR),DATEDIFF(HRData[Hire Date],HRData[Termination Date],YEAR))

Event = IF(HRData[Status] = "Inactive",1,0)

Count = 1

 

I created a Calendar table using 

 

Calendar = CALENDARAUTO(12)

Which I tied to Termination Date (relationship)

 

I created a measure to give me a running total for Count:

 

Count running total in Date = 
CALCULATE(
	SUM('HRData'[Count]),
	FILTER(
		ALLSELECTED('Calendar'[Date]),
		ISONORAFTER('Calendar'[Date], MAX('Calendar'[Date]), DESC)
	)
)

I created a measure for d_i:

 

d_i = SUM(HRData[Event])

I created a measure for n_i

 

n_i = COUNTROWS(HRData) + CALCULATE(COUNTROWS(HRData),ALL(HRData)) - [Count running total in Date]

This is where things get a little messy. The formula for KM goes something like:

 

(Previous Value of KM) * (1-d_i/n_i)

 

So, what I did was create three measures like this:

 

KM1 = 1-[d_i]/[n_i]

KM-1 = CALCULATE([KM1],DATEADD('Calendar'[Date],-1,DAY))

KM = [KM-1]*[KM1]

The thought here was that I could have a continuous line chart by [Date] in Calendar and calculate KM1 for that date, KM1 for the previous date (KM-1) and then use those to calculate KM.

 

But, I'm not really getting what I expected and was wondering if someone could take a look and see if they can get this to work or have a better way.

 

 

 

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
1 ACCEPTED SOLUTION

Full solution posted here:

https://www.linkedin.com/pulse/kaplan-meier-survival-curves-power-bi-greg-deckler-microsoft-mvp-

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

10 REPLIES 10
daciga
Frequent Visitor

Hello Greg, do you have the sample power bi output to share so that we can play with it? 

 

Thanks, 

 

Dennis

Greg_Deckler
Super User
Super User

OK, I think I solved this or at least gotten very much closer. I just had a slap the forehead moment when I realized that what the KM formula calls for is an accumulated product. Duh!

 

So, what I did created the following custom columns in my Calendar table.

 

c_d_i = [d_i] //d_i measure

c_n_i = [n_i] //n_i measure

c_d_i/n_i = [c_d_i]/[c_n_i]

c_1-d_i/n_i = 1-[c_d_i/n_i]

Then I created the following measure:

 

KM running total in Date = 
CALCULATE(
	PRODUCT('Calendar'[c_1-d_i/n_i]),
	FILTER(
		ALLSELECTED('Calendar'[Date]),
		ISONORAFTER('Calendar'[Date], MAX('Calendar'[Date]), DESC)
	)
)

This *appears* to give me what I was looking for if I use Date as my X-Axis, I get a survival curve over time. I guess what I need to solve to get the same thing as the article is to get this for "Years" with the company, but I'm pretty certain I know how to do that with a SUMMARIZE. I'll update this once I get there.

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

OK, closer still, though I am not getting exactly the same results as in the original article here:

https://www.linkedin.com/pulse/survival-curves-tableau-hr-data-chris-short/

 

My ego would have me believe that the original article is in error, but that is probably not true and I have something off in my calculations, if someone could help me find where things are going awry that would be great!! 🙂

 

OK, so I start with the HRData table from the article mentioned above. I'd post it here but that would be too much data. Here are the first 10 rows:

IDNameHire DateTermination DateStatusDepartment
0 1/1/20101/1/2010InactiveHR
0 1/1/20101/1/2010InactiveSales
1Inez Laurie  4/23/20109/2/2015InactiveHR
2Yasuko Scruton  1/20/2010 ActiveHR
3Kristopher Walkes  4/18/2010 ActiveHR
4Geraldine Mclennon  6/12/2010 ActiveHR
5Pilar Willard  1/14/2010 ActiveHR
6Candace Molton  4/19/20104/4/2012InactiveHR
7Evelyne Kneeland  4/22/2010 ActiveHR
8Elois Hires  5/26/20108/11/2013InactiveHR
9Keeley Dewolf  1/11/20105/27/2012InactiveHR
10Melisa Padua  5/16/2010 ActiveHR

 

So, I created a KM table like this:

KM = SUMMARIZE(HRData,HRData[Years],"Count",SUM(HRData[Count]),"d_i",SUM(HRData[Event]))

Then I added these columns:

 

Running = SUMX(FILTER(ALL(KM),[Years]<EARLIER([Years])),KM[Count])

n_i = [Count] + CALCULATE(SUM([Count]),ALL(KM)) - [Running]

d_i/n_i = KM[d_i]/KM[n_i]

1-d_i/n_i = 1-KM[d_i/n_i]

Then this measure:

 

MyKM = 
CALCULATE(
	PRODUCT('KM'[1-d_i/n_i]),
	FILTER(
		ALLSELECTED('KM'[Years]),
		ISONORAFTER('KM'[Years], MAX('KM'[Years]), DESC)
	)
)

You can then plot KM[Years] on the x-axis and MyKm on the y-axis in a line chart.

 

OK, so then I wanted to do this by Department, so I created a KMDept table like this:

 

KMDept = SUMMARIZE(HRData,HRData[DeptYears],"Years",MAX(HRData[Years]),"Department",MAX(HRData[Department]),"Count",SUM(HRData[Count]),"d_i",SUM(HRData[Event]))

And columns like these:

 

Running = SUMX(FILTER(ALL(KMDept),[DeptYears]<EARLIER([DeptYears]) && KMDept[Department]=EARLIER(KMDept[Department])),KMDept[Count])

n_i = [Count] + CALCULATE(SUM([Count]),ALL(KMDept),FILTER(KMDept,KMDept[Department]=EARLIER(KMDept[Department]))) - [Running]

d_i/n_i = KMDept[d_i]/KMDept[n_i]

1-d_i/n_i = 1-KMDept[d_i/n_i]

And then a measure like this:

 

MyKMDept = 
CALCULATE(
	PRODUCT('KMDept'[1-d_i/n_i]),
	FILTER(
		ALLSELECTED('KMDept'[Years]),
		ISONORAFTER('KMDept'[Years], MAX('KMDept'[Years]), DESC)
	)
)

Then, you can plot KMDept[Years] on the x-axis and MyKMDept on the y-axis. 

 

Again, the results are similar, but not 100% the same and not sure what can be done to improve the solution. I'll keep working on it but it's an interesting problem. And if anyone has a better way to solve it, I'm up for it!!

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

OK, as an update to this, first I forgot to mention that after you create the table, you need to relate the two tables together on Years or DeptYears. In addition, on a hunch I created a Days column in the HRData just like the Years column but with DAY specified in the DATEDIFF versus YEAR. I then created a KMDays table just like before and did all of the calculations using this table but everything is in Days and the two tables are related on the Days column. Here is the formula for the initial table.

 

KMDays = SUMMARIZE(HRData,HRData[Days],"Count",SUM(HRData[Count]),"d_i",SUM(HRData[Event]))

After you have that, just add the rest of the column and measure equivalents for that table. When I plot KMDays[Days] and MyKMDays measure, then I get what looks to be the exact results from the original article. At 372 days the measure is 80.26% which seems to be dead on to original article. I'm thinking its the ability of Tableau to generate a true continuous axis that is the main difference when doing this by Year.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Full solution posted here:

https://www.linkedin.com/pulse/kaplan-meier-survival-curves-power-bi-greg-deckler-microsoft-mvp-

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Wow. Thank you so much for your work and dedication @Greg_Deckler! It is greatly appreciated. Hopefully someday the capabilities of PowerBi will expand so measures and calculations such as these are easier!

Yeah, I think the main stumbling point here is that EARLIER doesn't support a measure as a parameter, only a column. If it did support a measure, then it would be the equivalent of PREVIOUS_VALUE in Tableau and the two solutions would be nearly identical. That was the main hurdle that needed to be overcome. 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

Hi Greg,

 

Firstly, just woww, this is really great work!,

 

An addition to this, Please can you help me to get the confidence Interval(Upper & Lower bounds).

 

Kind Regards,

Venu

 

@Anonymous  - I will look into what it would take to achieve that. Stay tuned.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Very interesting, thanks for posting this along with the solution

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors