Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi All,
This post is similiar to one that was very recently posted by user eirikraha, but is slightly harder I think. I want to combine rows of data based on certain conditions, and there's no particular unique identifier to distiguish the rows I want to combine.
I have an existing table of machine data that tracks machine run times of particular parts through a flow of machines that looks something like this:
Row | JobNumber | PartNumber | SerialNumber | MachineID | DateIn | TimeIn | DateOut | TimeOut |
1 | 1 | 1 | 1 | A | 01/21/22 | 08:24AM | 01/21/22 | 01:36PM |
2 | 1 | 1 | 2 | D | 01/21/22 | 09:53AM | 01/21/22 | 10:31AM |
3 | 1 | 1 | 2 | C | 01/21/22 | 11:24AM | 01/21/22 | 4:55PM |
4 | 1 | 2 | 1 | A | 01/22/22 | 03:02PM | 01/22/22 | 03:56PM |
5 | 1 | 2 | 2 | B | 01/22/22 | 08:01AM | 01/22/22 | 02:47PM |
6 | 2 | 1 | 1 | A | 01/23/22 | 09:51AM | 01/23/22 | 10:31AM |
7 | 2 | 1 | 1 | A | 01/23/22 | 10:32AM | 01/23/22 | 01:28PM |
8 | 2 | 1 | 1 | A | 01/24/22 | 01:31PM | 01/24/22 | 05:33PM |
9 | 2 | 1 | 1 | G | 01/25/22 | 09:39AM | 01/25/22 | 09:45AM |
10 | 2 | 1 | 2 | D | 01/25/22 | 11:49AM | 01/26/22 | 09:03AM |
11 | 2 | 1 | 2 | C | 01/25/22 | 05:43AM | 01/25/22 | 12:43PM |
12 | 2 | 2 | 1 | B | 01/25/22 | 09:00AM | 01/25/22 | 01:15PM |
13 | 2 | 2 | 1 | B | 01/25/22 | 01:19PM | 01/25/22 | 07:12PM |
14 | 2 | 2 | 1 | B | 01/25/22 | 07:13PM | 01/25/22 | 10:05PM |
Note that the rows are sorted in ascending order by JobNumber/PartNumber/SerialNumber/DateIn/TimeIn so anything in two succeeding rows that's the same JobNumber/PartNumber/SerialNumber should be a track of which machines the part has gone through.
Notice through that rows 6, 7 and rows 12, 13, 14 are the same parts in the same succeeding machines the the date out from the initial rows is within 5 minutes of the DateIn from the second row. Almost as if the data had a slight pause and continuation. I want to combine all of these little combinations of rows where the "data pause" is less than 5 minutes.
Notice also that row 8 is the same JobNumber, PartNumber, SerialNumber, and Machine as row 7 but it's on a different date so that one shouldn't be combined. The resulting table should be:
Row | JobNumber | PartNumber | SerialNumber | MachineID | DateIn | TimeIn | DateOut | TimeOut |
1 | 1 | 1 | 1 | A | 01/21/22 | 08:24AM | 01/21/22 | 01:36PM |
2 | 1 | 1 | 2 | D | 01/21/22 | 09:53AM | 01/21/22 | 10:31AM |
3 | 1 | 1 | 2 | C | 01/21/22 | 11:24AM | 01/21/22 | 4:55PM |
4 | 1 | 2 | 1 | A | 01/22/22 | 03:02PM | 01/22/22 | 03:56PM |
5 | 1 | 2 | 2 | B | 01/22/22 | 08:01AM | 01/22/22 | 02:47PM |
6 | 2 | 1 | 1 | A | 01/23/22 | 09:51AM | 01/23/22 | 01:28AM |
8 | 2 | 1 | 1 | A | 01/24/22 | 01:31PM | 01/24/22 | 05:33PM |
9 | 2 | 1 | 1 | G | 01/25/22 | 09:39AM | 01/25/22 | 09:45AM |
10 | 2 | 1 | 2 | D | 01/25/22 | 11:49AM | 01/26/22 | 09:03AM |
11 | 2 | 1 | 2 | C | 01/25/22 | 05:43AM | 01/25/22 | 12:43PM |
12 | 2 | 2 | 1 | B | 01/25/22 | 09:00AM | 01/25/22 | 10:05PM |
Solved! Go to Solution.
The only way I can think of to do this efficiently involves using List.Generate to iterate through the table comparing rows to previous rows to generate an index of which rows should be grouped together.
Here's a sample query you can paste into the Advanced Editor:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hZLLCoNADEX/xXWhefrITlvoSnAv/v9vdGzn1dhWcGAIOffeTFzXBpvLxxnDAbxS+Gi/9kYyzq6Ixu0yN9tlbahi9/vdtQ6m7HkEYxzfPDv+5lrxi7+YarSXCq3jU7RnA1rmQ1FTfK34/UyutTfA0fNk0kW+rbxrfy7jF56P43f/+b2VPB+en/ro3//gpWwK8/ypqMYc+cHxj9iqOT8P2b8URWN+hEqg3r/m/UkRaLMAcBJAJ3BzZmrCPgGGDaQJMK0uiUyHtACHEdAw/ULIJwKhd1i8QGdISUBOBEIve4GwWXgl2J4=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Row = _t, JobNumber = _t, PartNumber = _t, SerialNumber = _t, MachineID = _t, DateIn = _t, TimeIn = _t, DateOut = _t, TimeOut = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Row", Int64.Type}, {"JobNumber", Int64.Type}, {"PartNumber", Int64.Type}, {"SerialNumber", Int64.Type}, {"MachineID", type text}, {"DateIn", type date}, {"TimeIn", type time}, {"DateOut", type date}, {"TimeOut", type time}}, "en-US"),
Buffer = List.Buffer(Table.ToRecords(#"Changed Type")),
#"Added Custom" = Table.FromRecords(List.Skip(
List.Generate(
() => [prev_R = Buffer{0}, i = 0, n = 0],
each [i] <= List.Count(Buffer),
each
[
prev_R = Buffer{[i]},
i = [i] + 1,
n = if Record.SelectFields(
prev_R,
{"JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}
) =
Record.SelectFields(
[prev_R],
{"JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}
)
and prev_R[TimeIn] - [prev_R][TimeOut] < #duration(0, 0, 5, 0)
then [n]
else [n] + 1
],
each [prev_R] & [n = [n]]
)
)),
#"Grouped Rows" = Table.Group(#"Added Custom", {"n", "JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}, {{"FirstRow", each List.Min([Row]), type number}, {"TimeIn", each List.Min([TimeIn]), type time}, {"TimeOut", each List.Max([TimeOut]), type time}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Grouped Rows",{{"n", Int64.Type}, {"JobNumber", Int64.Type}, {"PartNumber", Int64.Type}, {"SerialNumber", Int64.Type}, {"MachineID", type text}, {"DateIn", type date}, {"DateOut", type date}, {"FirstRow", Int64.Type}, {"TimeIn", type time}, {"TimeOut", type time}})
in
#"Changed Type1"
This is fairly complex but you should be able to see the core logic in the middle is comparing the current and previous row and using the 5-minute condition specified.
The only way I can think of to do this efficiently involves using List.Generate to iterate through the table comparing rows to previous rows to generate an index of which rows should be grouped together.
Here's a sample query you can paste into the Advanced Editor:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hZLLCoNADEX/xXWhefrITlvoSnAv/v9vdGzn1dhWcGAIOffeTFzXBpvLxxnDAbxS+Gi/9kYyzq6Ixu0yN9tlbahi9/vdtQ6m7HkEYxzfPDv+5lrxi7+YarSXCq3jU7RnA1rmQ1FTfK34/UyutTfA0fNk0kW+rbxrfy7jF56P43f/+b2VPB+en/ro3//gpWwK8/ypqMYc+cHxj9iqOT8P2b8URWN+hEqg3r/m/UkRaLMAcBJAJ3BzZmrCPgGGDaQJMK0uiUyHtACHEdAw/ULIJwKhd1i8QGdISUBOBEIve4GwWXgl2J4=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Row = _t, JobNumber = _t, PartNumber = _t, SerialNumber = _t, MachineID = _t, DateIn = _t, TimeIn = _t, DateOut = _t, TimeOut = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Row", Int64.Type}, {"JobNumber", Int64.Type}, {"PartNumber", Int64.Type}, {"SerialNumber", Int64.Type}, {"MachineID", type text}, {"DateIn", type date}, {"TimeIn", type time}, {"DateOut", type date}, {"TimeOut", type time}}, "en-US"),
Buffer = List.Buffer(Table.ToRecords(#"Changed Type")),
#"Added Custom" = Table.FromRecords(List.Skip(
List.Generate(
() => [prev_R = Buffer{0}, i = 0, n = 0],
each [i] <= List.Count(Buffer),
each
[
prev_R = Buffer{[i]},
i = [i] + 1,
n = if Record.SelectFields(
prev_R,
{"JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}
) =
Record.SelectFields(
[prev_R],
{"JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}
)
and prev_R[TimeIn] - [prev_R][TimeOut] < #duration(0, 0, 5, 0)
then [n]
else [n] + 1
],
each [prev_R] & [n = [n]]
)
)),
#"Grouped Rows" = Table.Group(#"Added Custom", {"n", "JobNumber", "PartNumber", "SerialNumber", "MachineID", "DateIn", "DateOut"}, {{"FirstRow", each List.Min([Row]), type number}, {"TimeIn", each List.Min([TimeIn]), type time}, {"TimeOut", each List.Max([TimeOut]), type time}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Grouped Rows",{{"n", Int64.Type}, {"JobNumber", Int64.Type}, {"PartNumber", Int64.Type}, {"SerialNumber", Int64.Type}, {"MachineID", type text}, {"DateIn", type date}, {"DateOut", type date}, {"FirstRow", Int64.Type}, {"TimeIn", type time}, {"TimeOut", type time}})
in
#"Changed Type1"
This is fairly complex but you should be able to see the core logic in the middle is comparing the current and previous row and using the 5-minute condition specified.
This makes fantastically perfect sense, thank you! I'm ingratiating it into the table structure for the actual data I have so once I can get it to work I'll mark your answer as the solution.
In the meantime while I wait for PowerBi (since it's a slow monstrosity) I'm curious if you have any thoughts on how to deal with the edge case that that small 5 minute interval time happens over a midnight transition. So if you had two rows like:
Row | JobNumber | PartNumber | SerialNumber | MachineID | DateIn | TimeIn | DateOut | TimeOut |
1 | 1 | 1 | 1 | A | 01/21/22 | 08:24PM | 01/21/22 | 11:59PM |
2 | 1 | 1 | 1 | A | 01/22/22 | 12:03AM | 01/22/22 | 01:31AM |
These two rows should collapse as well? I see in your current structure you only check to make sure the next row is equal in DateIn and DateOut to the current row, if so, collapse the rows, but it may be the case that two rows should collapse, but not have equal DateIn and DateOut.
And while I give you that to chew on I should also mention that the following could also be true:
Row | JobNumber | PartNumber | SerialNumber | MachineID | DateIn | TimeIn | DateOut | TimeOut |
1 | 1 | 1 | 1 | A | 01/21/22 | 08:24AM | 01/21/22 | 01:36PM |
2 | 1 | 1 | 1 | A | 01/22/22 | 01:37PM | 01/22/22 | 04:37PM |
And these shouldn't collapse even through each row has equal JobNumber, PartNumber, SerialNumber, and MachineID. So some sort of check of Dates and Times doesn't still need to be done.
Would you maybe just take out the DateIn and DateOut equivlency checks and maybe just add another and that's something like:
and prev_R[DateIn] - [prev_R][DateOut] < #duration(0, 1, 0, 0)
I understood from your original post that different dates shouldn't be combined but I may have misinterpreted "it's on a different date so that one shouldn't be combined".
If that's not the case, then you should compare datetime values. For example:
n = if Record.SelectFields( prev_R, {"JobNumber", "PartNumber", "SerialNumber", "MachineID"}) =
Record.SelectFields([prev_R], {"JobNumber", "PartNumber", "SerialNumber", "MachineID"})
and DateTime.From(Text.From(prev_R[DateIn]) & " " & Text.From(prev_R[TimeIn]))
- DateTime.From(Text.From([prev_R][DateOut]) & " " & Text.From([prev_R][TimeOut]))
< #duration(0, 0, 5, 0)
then [n]
else [n] + 1
Sorry for the confusion. The two examples I've given about when to not combine times and when to combine times are a little different.
In my OP the two columns I mention to not combine have the same JobNum, PartNum, SerialNum, and MachineID but are from drastically different dates. So even through the difference between their DateIn/DateOut times were within 5 minutes, since their dates were drastically different (more than 2 days), they shouldn't be combined.
The second example I mention to not combine have the same JobNum, PartNum, SerialNum, and MachineID but are from dates only seperated by one day and match the 5 minute threshold interval. Basically it happened over midnite.
So I would like the second example to combine even though they are different dates, and the first example to not combine because they are VERY different dates.
Yeah, I just misread it a bit. Give the datetime version a try.
There's almost certainly a more efficient way of converting date + time to datetime than the hacky conversion to and from text I gave but that can be optimized later if needed.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
102 | |
53 | |
21 | |
13 | |
11 |