Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi there
I'm trying to "wrap up" multiple rows of data into a single row based on multiple conditions. I have limited experience using SQL (in the program TOAD) and have been able to achieve something similar using SORT, INDEX, PARTITION, RANK and report COUNT, SUM, MIN and MAX values for certain fields, but I can't work out a solution for Power BI Desktop. Any assistance would be greatly appreciated.
I have the following as an example. It shows for an individual, the date they were admitted to hospital, the date they were discharged (allowing for calculation of length of stay in hospital) and where they were discharged to. The problem is that when the DISCHARGE_TO field is populated with statistical separation, this isn't a real discharge (ie the person is still in hospital, there was just a change in their care). So I am trying to merge some rows to have a single row for each hospital admission.
This is what the data looks like.
Person | Admit Date | Discharge Date | Discharge To | Length of Stay |
AB123 | 1/01/2014 | 4/01/2014 | Home | 3 |
AB123 | 21/04/2014 | 23/04/2014 | Home | 2 |
AB123 | 9/07/2014 | 10/07/2014 | Home | 1 |
AB123 | 15/06/2015 | 16/06/2015 | Statistical Separation | 1 |
AB123 | 16/06/2015 | 18/06/2015 | Statistical Separation | 2 |
AB123 | 18/06/2015 | 30/06/2015 | Home | 12 |
AB123 | 8/10/2017 | 16/10/2017 | Statistical Separation | 8 |
AB123 | 16/10/2017 | 1/12/2017 | Home | 46 |
AB123 | 11/12/2017 | 14/12/2017 | Home | 3 |
AB123 | 28/12/2017 | 30/12/2017 | Statistical Separation | 2 |
AB123 | 30/12/2017 | 10/01/2018 | Statistical Separation | 11 |
AB123 | 10/01/2018 | 13/01/2018 | Home | 3 |
AB123 | 19/01/2018 | 21/01/2018 | Statistical Separation | 2 |
AB123 | 21/01/2018 | 24/01/2018 | Home | 3 |
XY789 | 4/07/2016 | 8/07/2016 | Statistical Separation | 4 |
XY789 | 8/07/2016 | 12/07/2016 | Home | 4 |
XY789 | 1/09/2016 | 4/09/2016 | Home | 3 |
XY789 | 5/04/2017 | 6/05/2017 | Home | 31 |
XY789 | 9/05/2017 | 10/05/2017 | Home | 1 |
XY789 | 1/10/2017 | 4/10/2017 | Home | 3 |
XY789 | 30/11/2017 | 1/12/2017 | Statistical Separation | 1 |
XY789 | 1/12/2017 | 2/12/2017 | Statistical Separation | 1 |
XY789 | 2/12/2017 | 7/12/2017 | Home | 5 |
XY789 | 1/01/2018 | 2/01/2018 | Statistical Separation | 1 |
XY789 | 2/01/2018 | 6/01/2018 | Home | 4 |
And this is what I want the result to look like:
Person | Admit Date | Discharge Date | Length of Stay | Count_Statistical Separations |
AB123 | 1/01/2014 | 4/01/2014 | 3 | 0 |
AB123 | 21/04/2014 | 23/04/2014 | 2 | 0 |
AB123 | 9/07/2014 | 10/07/2014 | 1 | 0 |
AB123 | 15/06/2015 | 30/06/2015 | 15 | 2 |
AB123 | 8/10/2017 | 1/12/2017 | 54 | 1 |
AB123 | 11/12/2017 | 14/12/2017 | 3 | 0 |
AB123 | 28/12/2017 | 13/01/2018 | 16 | 2 |
AB123 | 19/01/2018 | 24/01/2018 | 5 | 1 |
XY789 | 4/07/2016 | 12/07/2016 | 8 | 1 |
XY789 | 1/09/2016 | 4/09/2016 | 3 | 0 |
XY789 | 5/04/2017 | 6/05/2017 | 31 | 0 |
XY789 | 9/05/2017 | 10/05/2017 | 1 | 0 |
XY789 | 1/10/2017 | 4/10/2017 | 3 | 0 |
XY789 | 30/11/2017 | 7/12/2017 | 7 | 2 |
XY789 | 1/01/2018 | 6/01/2018 | 5 | 1 |
Rows should only be merged if the preceeding row has the discharge to field populated with statistcal separation, and the preceeding row discharge date is equal to the current row admit date.
The data I have contains more than 1 million rows so trying to achieve this in Excel isn't possible.
Any advice or suggestions would be fantastic. Thanks!!
Solved! Go to Solution.
Hi @rupertbrown
I think that the basic idea of what you need to do is add an index, then add a calculated column that enters null where the value is "Statistical Separation" or the index value, then Fill Up on that column.
Then group on that column and the Person, with the aggregates of Min of Admit Date, Max of Discharge Date, and Sum of Length of stay.
Here is the M code:
let Source = Excel.Workbook(File.Contents("C:\testing1.xlsx"), null, true), Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data], #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]), #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Person", type text}, {"Admit Date", type any}, {"Discharge Date", type any}, {"Discharge To", type text}, {"Length of Stay", Int64.Type}}), #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1), #"Added Conditional Column" = Table.AddColumn(#"Added Index", "HomeIndex", each if [Discharge To] = "Statistical Separation" then null else [Index]), #"Grouped Rows" = Table.Group(#"Added Conditional Column", {"Person"}, {{"SecondTable", each _, type table}}), #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.FillUp ([SecondTable],{"HomeIndex"})), #"Expanded Custom1" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Person", "Admit Date", "Discharge Date", "Discharge To", "Length of Stay", "Index", "HomeIndex"}, {"Person.1", "Admit Date", "Discharge Date", "Discharge To", "Length of Stay", "Index", "HomeIndex"}), #"Added Conditional Column1" = Table.AddColumn(#"Expanded Custom1", "Custom", each if [HomeIndex] = null then [Index] else [HomeIndex]), #"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column1",{"SecondTable", "Person.1", "Index", "HomeIndex"}), #"Grouped Rows1" = Table.Group(#"Removed Columns", {"Custom", "Person"}, {{"Admit date", each List.Min([Admit Date]), type anynonnull}, {"Discharge Date", each List.Max([Discharge Date]), type anynonnull}, {"LOS", each List.Sum([Length of Stay]), type number}, {"S", each _, type table}}), #"Added Custom2" = Table.AddColumn(#"Grouped Rows1", "Count of Statistical Separation", each Table.RowCount( Table.SelectRows([S], each [Discharge To]="Statistical Separation"))), #"Removed Columns2" = Table.RemoveColumns(#"Added Custom2",{"S", "Custom"}) in #"Removed Columns2"
Hi @rupertbrown
This DAX calculated table might be close
Table = Var y = ADDCOLUMNS( 'Table1', "Batch",1+ CALCULATE( COUNTROWS('Table1'), FILTER( 'Table1', 'Table1'[Admit Date]<EARLIER('Table1'[Admit Date]) && 'Table1'[Person]=EARLIER('Table1'[Person]) && 'Table1'[Discharge To] = "Home" ))) RETURN SUMMARIZE( y, [Person], [Batch], "Admit Date" , MIN('Table1'[Admit Date]), "Discharge Date" , MAX('Table1'[Discharge Date]), "Length of Stay" ,INT(MAX('Table1'[Discharge Date]) - MIN('Table1'[Admit Date])), "Count Statistical Separations" , COUNT('Table1'[Admit Date])-1 )
Hi @rupertbrown
This DAX calculated table might be close
Table = Var y = ADDCOLUMNS( 'Table1', "Batch",1+ CALCULATE( COUNTROWS('Table1'), FILTER( 'Table1', 'Table1'[Admit Date]<EARLIER('Table1'[Admit Date]) && 'Table1'[Person]=EARLIER('Table1'[Person]) && 'Table1'[Discharge To] = "Home" ))) RETURN SUMMARIZE( y, [Person], [Batch], "Admit Date" , MIN('Table1'[Admit Date]), "Discharge Date" , MAX('Table1'[Discharge Date]), "Length of Stay" ,INT(MAX('Table1'[Discharge Date]) - MIN('Table1'[Admit Date])), "Count Statistical Separations" , COUNT('Table1'[Admit Date])-1 )
This is amazing. I find DAX easier to understand so this is a brilliant solution.
Thanks so much for your help.
Cheers
Hi @rupertbrown
I think that the basic idea of what you need to do is add an index, then add a calculated column that enters null where the value is "Statistical Separation" or the index value, then Fill Up on that column.
Then group on that column and the Person, with the aggregates of Min of Admit Date, Max of Discharge Date, and Sum of Length of stay.
Here is the M code:
let Source = Excel.Workbook(File.Contents("C:\testing1.xlsx"), null, true), Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data], #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]), #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Person", type text}, {"Admit Date", type any}, {"Discharge Date", type any}, {"Discharge To", type text}, {"Length of Stay", Int64.Type}}), #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1), #"Added Conditional Column" = Table.AddColumn(#"Added Index", "HomeIndex", each if [Discharge To] = "Statistical Separation" then null else [Index]), #"Grouped Rows" = Table.Group(#"Added Conditional Column", {"Person"}, {{"SecondTable", each _, type table}}), #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.FillUp ([SecondTable],{"HomeIndex"})), #"Expanded Custom1" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Person", "Admit Date", "Discharge Date", "Discharge To", "Length of Stay", "Index", "HomeIndex"}, {"Person.1", "Admit Date", "Discharge Date", "Discharge To", "Length of Stay", "Index", "HomeIndex"}), #"Added Conditional Column1" = Table.AddColumn(#"Expanded Custom1", "Custom", each if [HomeIndex] = null then [Index] else [HomeIndex]), #"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column1",{"SecondTable", "Person.1", "Index", "HomeIndex"}), #"Grouped Rows1" = Table.Group(#"Removed Columns", {"Custom", "Person"}, {{"Admit date", each List.Min([Admit Date]), type anynonnull}, {"Discharge Date", each List.Max([Discharge Date]), type anynonnull}, {"LOS", each List.Sum([Length of Stay]), type number}, {"S", each _, type table}}), #"Added Custom2" = Table.AddColumn(#"Grouped Rows1", "Count of Statistical Separation", each Table.RowCount( Table.SelectRows([S], each [Discharge To]="Statistical Separation"))), #"Removed Columns2" = Table.RemoveColumns(#"Added Custom2",{"S", "Custom"}) in #"Removed Columns2"
Hi @MarkS
This worked perfectly! Thank you so very much for the super quick reply and solution. Really appreciate the help.
Cheers
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
95 | |
77 | |
66 | |
53 |
User | Count |
---|---|
144 | |
105 | |
102 | |
89 | |
63 |