Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
rupertbrown
Frequent Visitor

Merge multiple rows into one with multiple conditions

Hi there

 

I'm trying to "wrap up" multiple rows of data into a single row based on multiple conditions. I have limited experience using SQL (in the program TOAD) and have been able to achieve something similar using SORT, INDEX, PARTITION, RANK and report COUNT, SUM, MIN and MAX values for certain fields, but I can't work out a solution for Power BI Desktop. Any assistance would be greatly appreciated.

 

I have the following as an example. It shows for an individual, the date they were admitted to hospital, the date they were discharged (allowing for calculation of length of stay in hospital) and where they were discharged to. The problem is that when the DISCHARGE_TO field is populated with statistical separation, this isn't a real discharge (ie the person is still in hospital, there was just a change in their care).  So I am trying to merge some rows to have a single row for each hospital admission.

 

This is what the data looks like.

PersonAdmit DateDischarge DateDischarge ToLength of Stay
AB1231/01/20144/01/2014Home3
AB12321/04/201423/04/2014Home2
AB1239/07/201410/07/2014Home1
AB12315/06/201516/06/2015Statistical Separation1
AB12316/06/201518/06/2015Statistical Separation2
AB12318/06/201530/06/2015Home12
AB1238/10/201716/10/2017Statistical Separation8
AB12316/10/20171/12/2017Home46
AB12311/12/201714/12/2017Home3
AB12328/12/201730/12/2017Statistical Separation2
AB12330/12/201710/01/2018Statistical Separation11
AB12310/01/201813/01/2018Home3
AB12319/01/201821/01/2018Statistical Separation2
AB12321/01/201824/01/2018Home3
XY7894/07/20168/07/2016Statistical Separation4
XY7898/07/201612/07/2016Home4
XY7891/09/20164/09/2016Home3
XY7895/04/20176/05/2017Home31
XY7899/05/201710/05/2017Home1
XY7891/10/20174/10/2017Home3
XY78930/11/20171/12/2017Statistical Separation1
XY7891/12/20172/12/2017Statistical Separation1
XY7892/12/20177/12/2017Home5
XY7891/01/20182/01/2018Statistical Separation1
XY7892/01/20186/01/2018Home4


And this is what I want the result to look like:

PersonAdmit DateDischarge DateLength of StayCount_Statistical Separations
AB1231/01/20144/01/201430
AB12321/04/201423/04/201420
AB1239/07/201410/07/201410
AB12315/06/201530/06/2015152
AB1238/10/20171/12/2017541
AB12311/12/201714/12/201730
AB12328/12/201713/01/2018162
AB12319/01/201824/01/201851
XY7894/07/201612/07/201681
XY7891/09/20164/09/201630
XY7895/04/20176/05/2017310
XY7899/05/201710/05/201710
XY7891/10/20174/10/201730
XY78930/11/20177/12/201772
XY7891/01/20186/01/201851

 

 

Rows should only be merged if the preceeding row has the discharge to field populated with statistcal separation, and the preceeding row discharge date is equal to the current row admit date.

 

The data I have contains more than 1 million rows so trying to achieve this in Excel isn't possible.

 

Any advice or suggestions would be fantastic. Thanks!!

2 ACCEPTED SOLUTIONS
MarkS
Resolver IV
Resolver IV

Hi @rupertbrown

 

I think that the basic idea of  what you need to do is add an index, then add a calculated column that enters null where the value is "Statistical Separation" or the index value, then Fill Up on that column.

 

Then group on that column and the Person, with the aggregates of Min of Admit Date, Max of Discharge Date, and Sum of Length of stay.

 

Here is the M code: 

 

let
    Source = Excel.Workbook(File.Contents("C:\testing1.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Person", type text}, {"Admit Date", type any}, {"Discharge Date", type any}, {"Discharge To", type text}, {"Length of Stay", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    #"Added Conditional Column" = Table.AddColumn(#"Added Index", "HomeIndex", each if [Discharge To] = "Statistical Separation" then null else [Index]),
    #"Grouped Rows" = Table.Group(#"Added Conditional Column", {"Person"}, {{"SecondTable", each _, type table}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.FillUp ([SecondTable],{"HomeIndex"})),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Person", "Admit Date", "Discharge Date", "Discharge To", "Length of Stay", "Index", "HomeIndex"}, {"Person.1", "Admit Date", "Discharge Date", "Discharge To", "Length of Stay", "Index", "HomeIndex"}),
    #"Added Conditional Column1" = Table.AddColumn(#"Expanded Custom1", "Custom", each if [HomeIndex] = null then [Index] else [HomeIndex]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column1",{"SecondTable", "Person.1", "Index", "HomeIndex"}),
    #"Grouped Rows1" = Table.Group(#"Removed Columns", {"Custom", "Person"}, {{"Admit date", each List.Min([Admit Date]), type anynonnull}, {"Discharge Date", each List.Max([Discharge Date]), type anynonnull}, {"LOS", each List.Sum([Length of Stay]), type number}, {"S", each _, type table}}),
    #"Added Custom2" = Table.AddColumn(#"Grouped Rows1", "Count of Statistical Separation", each Table.RowCount( Table.SelectRows([S], each [Discharge To]="Statistical Separation"))),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom2",{"S", "Custom"})
in
    #"Removed Columns2"

 

View solution in original post

Phil_Seamark
Employee
Employee

Hi @rupertbrown

 

This DAX calculated table might be close

 

Table = 

Var y = ADDCOLUMNS(
            'Table1',
            "Batch",1+
            CALCULATE(
                COUNTROWS('Table1'),
                FILTER(
                    'Table1',
                    'Table1'[Admit Date]<EARLIER('Table1'[Admit Date])
                    && 'Table1'[Person]=EARLIER('Table1'[Person])
                    && 'Table1'[Discharge To] = "Home"
                    )))
RETURN 
    SUMMARIZE(
            y,
            [Person],
            [Batch],
            "Admit Date" , MIN('Table1'[Admit Date]),
            "Discharge Date" , MAX('Table1'[Discharge Date]),
            "Length of Stay" ,INT(MAX('Table1'[Discharge Date]) - MIN('Table1'[Admit Date])),
            "Count Statistical Separations" , COUNT('Table1'[Admit Date])-1
            )

 


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

View solution in original post

4 REPLIES 4
Phil_Seamark
Employee
Employee

Hi @rupertbrown

 

This DAX calculated table might be close

 

Table = 

Var y = ADDCOLUMNS(
            'Table1',
            "Batch",1+
            CALCULATE(
                COUNTROWS('Table1'),
                FILTER(
                    'Table1',
                    'Table1'[Admit Date]<EARLIER('Table1'[Admit Date])
                    && 'Table1'[Person]=EARLIER('Table1'[Person])
                    && 'Table1'[Discharge To] = "Home"
                    )))
RETURN 
    SUMMARIZE(
            y,
            [Person],
            [Batch],
            "Admit Date" , MIN('Table1'[Admit Date]),
            "Discharge Date" , MAX('Table1'[Discharge Date]),
            "Length of Stay" ,INT(MAX('Table1'[Discharge Date]) - MIN('Table1'[Admit Date])),
            "Count Statistical Separations" , COUNT('Table1'[Admit Date])-1
            )

 


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Hi @Phil_Seamark

 

This is amazing. I find DAX easier to understand so this is a brilliant solution.

 

Thanks so much for your help.

 

Cheers

MarkS
Resolver IV
Resolver IV

Hi @rupertbrown

 

I think that the basic idea of  what you need to do is add an index, then add a calculated column that enters null where the value is "Statistical Separation" or the index value, then Fill Up on that column.

 

Then group on that column and the Person, with the aggregates of Min of Admit Date, Max of Discharge Date, and Sum of Length of stay.

 

Here is the M code: 

 

let
    Source = Excel.Workbook(File.Contents("C:\testing1.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Person", type text}, {"Admit Date", type any}, {"Discharge Date", type any}, {"Discharge To", type text}, {"Length of Stay", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    #"Added Conditional Column" = Table.AddColumn(#"Added Index", "HomeIndex", each if [Discharge To] = "Statistical Separation" then null else [Index]),
    #"Grouped Rows" = Table.Group(#"Added Conditional Column", {"Person"}, {{"SecondTable", each _, type table}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.FillUp ([SecondTable],{"HomeIndex"})),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Person", "Admit Date", "Discharge Date", "Discharge To", "Length of Stay", "Index", "HomeIndex"}, {"Person.1", "Admit Date", "Discharge Date", "Discharge To", "Length of Stay", "Index", "HomeIndex"}),
    #"Added Conditional Column1" = Table.AddColumn(#"Expanded Custom1", "Custom", each if [HomeIndex] = null then [Index] else [HomeIndex]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column1",{"SecondTable", "Person.1", "Index", "HomeIndex"}),
    #"Grouped Rows1" = Table.Group(#"Removed Columns", {"Custom", "Person"}, {{"Admit date", each List.Min([Admit Date]), type anynonnull}, {"Discharge Date", each List.Max([Discharge Date]), type anynonnull}, {"LOS", each List.Sum([Length of Stay]), type number}, {"S", each _, type table}}),
    #"Added Custom2" = Table.AddColumn(#"Grouped Rows1", "Count of Statistical Separation", each Table.RowCount( Table.SelectRows([S], each [Discharge To]="Statistical Separation"))),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom2",{"S", "Custom"})
in
    #"Removed Columns2"

 

Hi @MarkS

 

This worked perfectly! Thank you so very much for the super quick reply and solution. Really appreciate the help.

 

Cheers

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.