cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
JohnGilmore
New Member

Combining Two Rows Without a Unique Identifier

Hi All,

 

This post is similiar to one that was very recently posted by user eirikraha, but is slightly harder I think. I want to combine rows of data based on certain conditions, and there's no particular unique identifier to distiguish the rows I want to combine.

 

I have an existing table of machine data that tracks machine run times of particular parts through a flow of machines that looks something like this:

 

Row  JobNumber  PartNumber   SerialNumber  MachineID  DateIn  TimeIn DateOut  TimeOut 
1111A01/21/2208:24AM01/21/2201:36PM
2112D01/21/2209:53AM01/21/2210:31AM
3112C01/21/2211:24AM01/21/224:55PM
4121A01/22/2203:02PM01/22/2203:56PM
5122B01/22/2208:01AM01/22/2202:47PM
6211A01/23/2209:51AM01/23/2210:31AM
7211A01/23/2210:32AM01/23/2201:28PM
8211A01/24/2201:31PM01/24/2205:33PM
9211G01/25/2209:39AM01/25/2209:45AM
10212D01/25/2211:49AM01/26/2209:03AM
11212C01/25/2205:43AM01/25/2212:43PM
12221B01/25/2209:00AM01/25/2201:15PM
13221B01/25/2201:19PM01/25/2207:12PM
14221B01/25/2207:13PM01/25/2210:05PM

 

Note that the rows are sorted in ascending order by JobNumber/PartNumber/SerialNumber/DateIn/TimeIn so anything in two succeeding rows that's the same JobNumber/PartNumber/SerialNumber should be a track of which machines the part has gone through.

 

Notice through that rows 6, 7 and rows 12, 13, 14 are the same parts in the same succeeding machines the the date out from the initial rows is within 5 minutes of the DateIn from the second row. Almost as if the data had a slight pause and continuation. I want to combine all of these little combinations of rows where the "data pause" is less than 5 minutes.

 

Notice also that row 8 is the same JobNumber, PartNumber, SerialNumber, and Machine as row 7 but it's on a different date so that one shouldn't be combined. The resulting table should be:

 

 

Row  JobNumber  PartNumber   SerialNumber  MachineID  DateIn  TimeIn  DateOut  TimeOut 
1111A01/21/2208:24AM01/21/2201:36PM
2112D01/21/2209:53AM01/21/2210:31AM
3112C01/21/2211:24AM01/21/224:55PM
4121A01/22/2203:02PM01/22/2203:56PM
5122B01/22/2208:01AM01/22/2202:47PM
6211A01/23/2209:51AM01/23/2201:28AM
8211A01/24/2201:31PM01/24/2205:33PM
9211G01/25/2209:39AM01/25/2209:45AM
10212D01/25/2211:49AM01/26/2209:03AM
11212C01/25/2205:43AM01/25/2212:43PM
12221B01/25/2209:00AM01/25/2210:05PM

 

1 ACCEPTED SOLUTION
AlexisOlson
Super User
Super User

The only way I can think of to do this efficiently involves using List.Generate to iterate through the table comparing rows to previous rows to generate an index of which rows should be grouped together.

 

Here's a sample query you can paste into the Advanced Editor:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hZLLCoNADEX/xXWhefrITlvoSnAv/v9vdGzn1dhWcGAIOffeTFzXBpvLxxnDAbxS+Gi/9kYyzq6Ixu0yN9tlbahi9/vdtQ6m7HkEYxzfPDv+5lrxi7+YarSXCq3jU7RnA1rmQ1FTfK34/UyutTfA0fNk0kW+rbxrfy7jF56P43f/+b2VPB+en/ro3//gpWwK8/ypqMYc+cHxj9iqOT8P2b8URWN+hEqg3r/m/UkRaLMAcBJAJ3BzZmrCPgGGDaQJMK0uiUyHtACHEdAw/ULIJwKhd1i8QGdISUBOBEIve4GwWXgl2J4=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Row = _t, JobNumber = _t, PartNumber = _t, SerialNumber = _t, MachineID = _t, DateIn = _t, TimeIn = _t, DateOut = _t, TimeOut = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Row", Int64.Type}, {"JobNumber", Int64.Type}, {"PartNumber", Int64.Type}, {"SerialNumber", Int64.Type}, {"MachineID", type text}, {"DateIn", type date}, {"TimeIn", type time}, {"DateOut", type date}, {"TimeOut", type time}}, "en-US"),
    Buffer = List.Buffer(Table.ToRecords(#"Changed Type")),
    #"Added Custom" = Table.FromRecords(List.Skip(
        List.Generate(
            () => [prev_R = Buffer{0}, i = 0, n = 0],
            each [i] <= List.Count(Buffer),
            each
                [
                prev_R = Buffer{[i]},
                i = [i] + 1,
                n = if  Record.SelectFields(
                            prev_R,
                            {"JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}
                        ) = 
                        Record.SelectFields(
                            [prev_R],
                            {"JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}
                        )
                        and prev_R[TimeIn] - [prev_R][TimeOut] < #duration(0, 0, 5, 0)
                    then [n]
                    else [n] + 1
                ],
            each [prev_R] & [n = [n]]
        )
    )),
    #"Grouped Rows" = Table.Group(#"Added Custom", {"n", "JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}, {{"FirstRow", each List.Min([Row]), type number}, {"TimeIn", each List.Min([TimeIn]), type time}, {"TimeOut", each List.Max([TimeOut]), type time}}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Grouped Rows",{{"n", Int64.Type}, {"JobNumber", Int64.Type}, {"PartNumber", Int64.Type}, {"SerialNumber", Int64.Type}, {"MachineID", type text}, {"DateIn", type date}, {"DateOut", type date}, {"FirstRow", Int64.Type}, {"TimeIn", type time}, {"TimeOut", type time}})
in
    #"Changed Type1"

This is fairly complex but you should be able to see the core logic in the middle is comparing the current and previous row and using the 5-minute condition specified.

 

View solution in original post

5 REPLIES 5
AlexisOlson
Super User
Super User

The only way I can think of to do this efficiently involves using List.Generate to iterate through the table comparing rows to previous rows to generate an index of which rows should be grouped together.

 

Here's a sample query you can paste into the Advanced Editor:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hZLLCoNADEX/xXWhefrITlvoSnAv/v9vdGzn1dhWcGAIOffeTFzXBpvLxxnDAbxS+Gi/9kYyzq6Ixu0yN9tlbahi9/vdtQ6m7HkEYxzfPDv+5lrxi7+YarSXCq3jU7RnA1rmQ1FTfK34/UyutTfA0fNk0kW+rbxrfy7jF56P43f/+b2VPB+en/ro3//gpWwK8/ypqMYc+cHxj9iqOT8P2b8URWN+hEqg3r/m/UkRaLMAcBJAJ3BzZmrCPgGGDaQJMK0uiUyHtACHEdAw/ULIJwKhd1i8QGdISUBOBEIve4GwWXgl2J4=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Row = _t, JobNumber = _t, PartNumber = _t, SerialNumber = _t, MachineID = _t, DateIn = _t, TimeIn = _t, DateOut = _t, TimeOut = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Row", Int64.Type}, {"JobNumber", Int64.Type}, {"PartNumber", Int64.Type}, {"SerialNumber", Int64.Type}, {"MachineID", type text}, {"DateIn", type date}, {"TimeIn", type time}, {"DateOut", type date}, {"TimeOut", type time}}, "en-US"),
    Buffer = List.Buffer(Table.ToRecords(#"Changed Type")),
    #"Added Custom" = Table.FromRecords(List.Skip(
        List.Generate(
            () => [prev_R = Buffer{0}, i = 0, n = 0],
            each [i] <= List.Count(Buffer),
            each
                [
                prev_R = Buffer{[i]},
                i = [i] + 1,
                n = if  Record.SelectFields(
                            prev_R,
                            {"JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}
                        ) = 
                        Record.SelectFields(
                            [prev_R],
                            {"JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}
                        )
                        and prev_R[TimeIn] - [prev_R][TimeOut] < #duration(0, 0, 5, 0)
                    then [n]
                    else [n] + 1
                ],
            each [prev_R] & [n = [n]]
        )
    )),
    #"Grouped Rows" = Table.Group(#"Added Custom", {"n", "JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}, {{"FirstRow", each List.Min([Row]), type number}, {"TimeIn", each List.Min([TimeIn]), type time}, {"TimeOut", each List.Max([TimeOut]), type time}}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Grouped Rows",{{"n", Int64.Type}, {"JobNumber", Int64.Type}, {"PartNumber", Int64.Type}, {"SerialNumber", Int64.Type}, {"MachineID", type text}, {"DateIn", type date}, {"DateOut", type date}, {"FirstRow", Int64.Type}, {"TimeIn", type time}, {"TimeOut", type time}})
in
    #"Changed Type1"

This is fairly complex but you should be able to see the core logic in the middle is comparing the current and previous row and using the 5-minute condition specified.

 

This makes fantastically perfect sense, thank you! I'm ingratiating it into the table structure for the actual data I have so once I can get it to work I'll mark your answer as the solution.

 

In the meantime while I wait for PowerBi (since it's a slow monstrosity) I'm curious if you have any thoughts on how to deal with the edge case that that small 5 minute interval time happens over a midnight transition. So if you had two rows like:

 

Row  JobNumber  PartNumber   SerialNumber  MachineID  DateIn  TimeIn  DateOut  TimeOut 
1111A01/21/2208:24PM01/21/2211:59PM
2111A01/22/2212:03AM01/22/2201:31AM

 

These two rows should collapse as well? I see in your current structure you only check to make sure the next row is equal in DateIn and DateOut to the current row, if so, collapse the rows, but it may be the case that two rows should collapse, but not have equal DateIn and DateOut.

 

And while I give you that to chew on I should also mention that the following could also be true:

 

Row  JobNumber  PartNumber   SerialNumber  MachineID  DateIn  TimeIn  DateOut  TimeOut 
1111A01/21/2208:24AM01/21/2201:36PM
2111A01/22/2201:37PM01/22/2204:37PM

 

And these shouldn't collapse even through each row has equal JobNumber, PartNumber, SerialNumber, and MachineID. So some sort of check of Dates and Times doesn't still need to be done.

 

Would you maybe just take out the DateIn and DateOut equivlency checks and maybe just add another and that's something like:

 

 

and prev_R[DateIn] - [prev_R][DateOut] < #duration(0, 1, 0, 0)

 

 

I understood from your original post that different dates shouldn't be combined but I may have misinterpreted "it's on a different date so that one shouldn't be combined".

 

If that's not the case, then you should compare datetime values. For example:

n = if  Record.SelectFields( prev_R,  {"JobNumber", "PartNumber", "SerialNumber", "MachineID"}) = 
        Record.SelectFields([prev_R], {"JobNumber", "PartNumber", "SerialNumber", "MachineID"})
        and DateTime.From(Text.From(prev_R[DateIn]) & " " & Text.From(prev_R[TimeIn]))
            - DateTime.From(Text.From([prev_R][DateOut]) & " " & Text.From([prev_R][TimeOut]))
            < #duration(0, 0, 5, 0)
    then [n]
    else [n] + 1

Sorry for the confusion. The two examples I've given about when to not combine times and when to combine times are a little different.

 

In my OP the two columns I mention to not combine have the same JobNum, PartNum, SerialNum, and MachineID but are from drastically different dates. So even through the difference between their DateIn/DateOut times were within 5 minutes, since their dates were drastically different (more than 2 days), they shouldn't be combined.

 

The second example I mention to not combine have the same JobNum, PartNum, SerialNum, and MachineID but are from dates only seperated by one day and match the 5 minute threshold interval. Basically it happened over midnite.

 

So I would like the second example to combine even though they are different dates, and the first example to not combine because they are VERY different dates. 

Yeah, I just misread it a bit. Give the datetime version a try.

 

There's almost certainly a more efficient way of converting date + time to datetime than the hacky conversion to and from text I gave but that can be optimized later if needed.

Helpful resources

Announcements
Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Power BI Dev Camp Session 23 768x460.jpg

Check it Out!

Mark your calendars and join us on Thursday, June 30 at 11a PDT for a great session with Ted Pattison!

June 20 episode 7 with aka link 768x460.jpg

The Power BI Community Show

Join us on June 20 at 11 am PDT when Kim Manis shares the latest on Azure Synapse analytics, the Microsoft Intelligent Data Platform, and notable Power BI Updates from Microsoft Build 2022.

Top Solution Authors