Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi All,
I have an excel with monthly data starting from month start to Month end. For the same category A and category B and for the same day I need to check if there are multiple records with overlapping time, if exists then I would like to group them by min start time and Max End time otherwise.
Start Date | Start Time | End Date | End Time | Category A | Category B |
04/07/2020 | 6:00:00 PM | 05/07/2020 | 2:00:00 AM | Single main | Grind |
05/07/2020 | 10:45:00 PM | 06/07/2020 | 12:15:00 PM | Single Main | Grind |
05/07/2020 | 11:45:00 PM | 06/07/2020 | 1:45:00 PM | Single Main | Grind |
29/07/2020 | 4:00:00 AM | 29/07/2020 | 6:00:00 AM | Main | Grind |
29/07/2020 | 10:00:00 PM | 29/07/2020 | 11:30:00 PM | Main | Grind |
from the above data,
the first row stays as is as there is only one record matching for startdate on 4/07/2020.
the second two can be grouped into one as the start dates are 05/07/2020 and category A and B matches, and the third row Start time (11:45:00 PM) falls before the end Time of second row Start Time (11:45:00 PM).
The last two rows should stay as is as even the Start Date, Category A and B matches but the last row Start Time ( 10:00:00 PM) doesn't overlap or start before the End time of the above row ( 6:00:00 PM).
I want some help with power query to acheive this. the output should be
Start Date | Start Time | End Date | End Time | Category A | Category B |
04/07/2020 | 6:00:00 PM | 05/07/2020 | 2:00:00 AM | Single main | Grind |
05/07/2020 | 10:45:00 PM | 06/07/2020 | 1:45:00 PM | Single Main | Grind |
29/07/2020 | 4:00:00 AM | 29/07/2020 | 6:00:00 AM | Main | Grind |
29/07/2020 | 10:00:00PM | 29/07/2020 | 11:30:00 PM | Main | Grind |
Any help would be apreciated. Thanks
Hello @vvadlapatla
I've tried to group first both categories and the date. Afterwards I've analysed the tables generated and deleted the second row, if a overlapping was identified. This approach is only working if you have 2 rows for category/start day. Check out if this suits your needs
let
Source = #table
(
{"Start Date","Start Time","End Date","End Time","Category A","Category B"},
{
{"44016","0,75","44017","8,33333333333333E-02","Single main","Grind"}, {"44017","0,947916666666667","44018","0,510416666666667","Single Main","Grind"},
{"44017","0,989583333333333","44018","0,572916666666667","Single Main","Grind"}, {"44041","0,166666666666667","44041","0,25","Main","Grind"}, {"44041","0,916666666666667","44041","0,979166666666667","Main","Grind"}
}
),
Transform = Table.TransformColumns(Source,{{"Start Date", each Date.From(Number.From(_)), type date},{"End Date", each Date.From(Number.From(_)), type date},{"Start Time", each Time.From(Number.From(_)), type time},{"End Time", each Time.From(Number.From(_)), type time}}),
GroupBy = Table.Group(Transform, {"Start Date", "Category A", "Category B"}, {{"AllRows", each _, type table [Start Date=date, Start Time=time, End Date=date, End Time=time, Category A=text, Category B=text]}}),
TransformTable = Table.TransformColumns
(
GroupBy,
{
{
"AllRows",
(tableInt)=>
if Table.RowCount(tableInt)=2
then if ((tableInt[End Date]{0} & tableInt[End Time]{0}) > (tableInt[Start Date]{1} & tableInt[Start Time]{1})) and ((tableInt[Start Date]{0} & tableInt[Start Time]{0}) < (tableInt[End Date]{1} & tableInt[End Time]{1})) then Table.FromRecords({tableInt{0}}) else tableInt
else tableInt
}
}
),
RemoveOtherColumns = Table.SelectColumns(TransformTable,{"AllRows"}),
ExpandedAllRows = Table.ExpandTableColumn(RemoveOtherColumns, "AllRows", {"Start Date", "Start Time", "End Date", "End Time", "Category A", "Category B"}, {"Start Date", "Start Time", "End Date", "End Time", "Category A", "Category B"})
in
ExpandedAllRows
Copy paste this code to the advanced editor in a new blank query to see how the solution works.
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Hi @Jimmy801 ,
you have mentioned that "This approach is only working if you have 2 rows for category/start day" what if I have more than two rows per category? Thanks
Hello @vvadlapatla
then this is simple not working with my code. It was also never defined what happens if more then 2 rows show up in a group. Maybe the solution of @ImkeF has some concept in the code
All the best
Jimmy
Hi @vvadlapatla ,
having trouble understanding your data.
Please post link to proper file with sample data (ideally with before and after status) or upload proper tables according to this description:
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
HI @ImkeF ,
Sorry for the delay in getting back to you. Thankyou for your links on "How to" docs for a newbie like me.
I will creatae a new thread and will mention that link in here but before that I came up with a workaround question and thought to ask your help for that. If this doesnt work then I will follow your procedure as mentioned below.
Actually I saw a DAX measure which will calculate time difference between End date Time of one row to start Date time of next row for similar group of categories. This is what I want to achieve finally in my table but it is in power query instead of DAX and so asking your help. here is the link to the DAX post I am mentioning.
https://www.c-sharpcorner.com/blogs/calculate-difference-between-consecutive-rows-group-by-column
Please help, if my question is not clear, happy to explain again.
Thanks
Hi @vvadlapatla ,
that sounds fairly different to the original requirement.
There is a "naive" solution to it that simply adds some conditional column similar to the DAX solution. But this will be very slow:
let
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText(
"i45WCs7MS89JVchNzMxT0lFyL8rMSwHSBib6Bub6RgZGBgqGFlYGBkAEEjWFixoYQUVjdeBm+KKZgVBtZGRlYgo1wwxhspGVoSnxZhhbmRhgmmFsZUqCO4yxugMmCjIDpPnQAiTtRpYIT5vAgwJZ1AwpKNDsRlJmZIRNM9BFxhDNsQA=",
BinaryEncoding.Base64
),
Compression.Deflate
)
),
let
_t = ((type text) meta [Serialized.Text = true])
in
type table[#"Category A" = _t, #"Category B" = _t, Start = _t, End = _t]
),
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Start", type datetime}, {"End", type datetime}}
),
#"Sorted Rows" = Table.Buffer(Table.Sort(#"Changed Type", {{"Start", Order.Ascending}})),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1, Int64.Type),
#"Added Custom2" = Table.AddColumn(
#"Added Index",
"Next Start Time",
each try
if [Category A]
= #"Added Index"{[Index] + 1}[Category A] and [Category B]
= #"Added Index"{[Index] + 1}[Category B] then
#"Added Index"{[Index] + 1}[Start]
else
null
otherwise null
),
#"Added Custom1" = Table.AddColumn(#"Added Custom2", "Break Time", each [Next Start Time] - [End]),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom1", {{"Break Time", type duration}})
in
#"Changed Type1"
So I'd recommend a faster approach that uses a custom function ( https://www.thebiccountant.com/2018/07/12/fast-and-easy-way-to-reference-previous-or-next-rows-in-po...) and applies it on grouped data ( https://www.thebiccountant.com/2017/05/29/performance-tip-partition-tables-crossjoins-possible-power... ) to speed up things:
let
fnNextRow = (MyTable as table, MyColumnName as text) =>
let
Source = MyTable,
ShiftedList = List.RemoveFirstN(Table.Column(Source, MyColumnName), 1) & {null},
Custom1 = Table.ToColumns(Source) & {ShiftedList},
Custom2 = Table.FromColumns(Custom1, Table.ColumnNames(Source) & {"Next Row"})
in
Custom2,
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText(
"i45WCs7MS89JVchNzMxT0lFyL8rMSwHSBib6Bub6RgZGBgqGFlYGBkAEEjWFixoYQUVjdeBm+KKZgVBtZGRlYgo1wwxhspGVoSnxZhhbmRhgmmFsZUqCO4yxugMmCjIDpPnQAiTtRpYIT5vAgwJZ1AwpKNDsRlJmZIRNM9BFxhDNsQA=",
BinaryEncoding.Base64
),
Compression.Deflate
)
),
let
_t = ((type text) meta [Serialized.Text = true])
in
type table[#"Category A" = _t, #"Category B" = _t, Start = _t, End = _t]
),
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Start", type datetime}, {"End", type datetime}}
),
#"Grouped Rows" = Table.Group(
#"Changed Type",
{"Category A", "Category B"},
{{"Group", each Table.Sort(_, {"Start"})}}
),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "GetNextRow", each fnNextRow([Group], "Start")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom", {"Group"}),
#"Expanded GetNextRow" = Table.ExpandTableColumn(
#"Removed Columns",
"GetNextRow",
{"Start", "End", "Next Row"},
{"Start", "End", "Next Row"}
),
#"Added Custom1" = Table.AddColumn(#"Expanded GetNextRow", "Break Time", each [Next Row] - [End])
in
#"Added Custom1"
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hi @vvadlapatla ,
my code would not work if your data is not sorted by date and the groups, sorry.
I've included a sorting command. Also, the sample data now includes 3 rows for the group, showing that it can be applied on any number of rows:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjDVNzDXNzIwMlDSUTI0tDIxsDIwUAjwBfIMzJClrEwRMsGZeek5qQq+iZl5QJ57UWZeilKsDtAwEyQdZkDlCLOQrTGCyjgimZWLYRaKwwysTExxOMzIytCUsMvQvYnLNGQZXIYZWSLpMEH2DIqMGbIMyJBDC3AbY2iAHFyoUoZWxggpVNfEAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Start Date" = _t, #"Start Time" = _t, #"End Date" = _t, #"End Time" = _t, #"Category A" = _t, #"Category B" = _t]),
#"Changed Type" = Table.TransformColumnTypes(
Source,
{
{"Start Time", type time},
{"End Time", type time},
{"End Date", type date},
{"Start Date", type date}
},
"de-DE"
),
#"Inserted Merged Date and Time" = Table.AddColumn(
#"Changed Type",
"Start",
each [Start Date] & [Start Time],
type datetime
),
#"Inserted Merged Date and Time1" = Table.AddColumn(
#"Inserted Merged Date and Time",
"End",
each [End Date] & [End Time],
type datetime
),
#"Sorted Rows" = Table.Buffer( Table.Sort(#"Inserted Merged Date and Time1",{{"Start", Order.Ascending}}) ),
#"Grouped Rows1" = Table.Group(
#"Sorted Rows",
{"Category A", "Category B", "Start", "End"},
{
{"GroupStart", each List.Min([Start]), type datetime},
{"GroupEnd", each List.Max([End]), type datetime},
{"Group", each _}
},
GroupKind.Local,
(x, y) => Number.From(x[End] <= y[Start])
)
in
#"Grouped Rows1"
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
HI @ImkeF ,
Sorry for too many questions. In your above reply you said that I need to sort by date and the groups ( what does groups mean? do you ean categories A and B?). I am asking this question as I have sorted with column Start it did work to an extent but still not working completely. Please see below. Thanks
2 | Single Main | Grinding | 05/07/2020 6:00:00 PM | 05/07/2020 11:15:00 PM | 05/07/2020 6:00:00 PM | 06/07/2020 12:15:00AM ? | [Table] |
2 | Single Main | Grinding | 05/07/2020 11:45:00 PM | 06/07/2020 1:45:00 AM | 05/07/2020 11:45:00 PM | 06/07/2020 2:00:00 AM ? | [Table] |
1 | Down Road | Grinding | 06/07/2020 8:00:00 PM | 06/07/2020 8:20:00 PM | 06/07/2020 8:00:00 PM | 07/07/2020 12:30:00 AM ? | [Table] |
1 | Up & Down Road, Up & Down North Fork | Grinding | 06/07/2020 11:30:00 PM | 07/07/2020 2:00:00 AM | 06/07/2020 11:30:00 PM | 07/07/2020 3:00:00 AM ? | [Table] |
1 | Down North Fork | Grinding | 07/07/2020 2:30:00 AM | 07/07/2020 3:00:00 AM | 07/07/2020 2:30:00 AM | 07/07/2020 3:00:00 AM | [Table] |
Above is my actual data and Please see underlined time columns in the above table. I have sorted "Start" column and then performed the groupby class as you have provided in the sample query. I am expecting the last 3 records to remain as is as CAT B is different and with first two records I am expecting them to be as is. even though the CAT 's are all same but the first record Ended at 11:15 Pm and second record started at 11:45 PM and so there is no overlap woth times.
Please let me know if I am doing anything wrong. Thanks
Hi @Jimmy801 ,
Firstly thanks for your help.
Its kind of worked but I dont want to delete overlapping row instead i want to have the end time to be replaced with the max End time, In this scenario there are two entries in row 2 and row 3 which start at 10:45 pm to 12:15 pm and 11:45 pm to 1:45 pm. so my final row should have 10:45pm to 1:45 pm as start and end time.
P.S: what i am trying to acheive at the end after doing this tranform is - for each category A and Category B, I am going to find out the total hours which is a total of start Date and End Date and also adjusted hours which don't count overlapping time.
Hi all,
sorry for stepping in here, but I've been working on it at the same time and I believe for performance reasons you should use my approach:
let
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText(
"i45WMjDRNzDXNzIwMlDSUTKzMjAAIoUAXyDHwBRJxggq4wiSCc7MS89JVchNzMwD8tyLMvNSlGJ1olF1GBpYmZgiDDNDljKyMkRIQU3zxW+aIW7TkGVwGWZkiaTDBNkzKDJmyDIgQw4twG2MoQFycKFKGVoZI6RQXRMLAA==",
BinaryEncoding.Base64
),
Compression.Deflate
)
),
let
_t = ((type text) meta [Serialized.Text = true])
in
type table[
#"Start Date" = _t,
#"Start Time" = _t,
#"End Date" = _t,
#"End Time" = _t,
#"Category A" = _t,
#"Category B" = _t
]
),
#"Changed Type" = Table.TransformColumnTypes(
Source,
{
{"Start Time", type time},
{"End Time", type time},
{"End Date", type date},
{"Start Date", type date}
},
"de-DE"
),
#"Inserted Merged Date and Time" = Table.AddColumn(
#"Changed Type",
"Start",
each [Start Date] & [Start Time],
type datetime
),
#"Inserted Merged Date and Time1" = Table.AddColumn(
#"Inserted Merged Date and Time",
"End",
each [End Date] & [End Time],
type datetime
),
#"Grouped Rows1" = Table.Group(
#"Inserted Merged Date and Time1",
{"Category A", "Category B", "Start", "End"},
{
{"GroupStart", each List.Min([Start]), type datetime},
{"GroupEnd", each List.Max([End]), type datetime},
{"Group", each _}
},
GroupKind.Local,
(x, y) => Number.From(x[End] <= y[Start])
)
in
#"Grouped Rows1"
It uses the 5th element of the Table.Group function that I've described here: https://www.thebiccountant.com/2018/01/21/table-group-exploring-the-5th-element-in-power-bi-and-powe...
Please note: My solution will return different results than Jimmys in case when the events span multiple days: If the first event starts at 5th July 8:00 and ends at 6th July 22:00 and the second event starts at 6th July 10:00 and ends at 6th July 23:00, my solution will group them as one event, while as Jimmy's solution would show them as 2.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hello @vvadlapatla
then try this
let
Source = #table
(
{"Start Date","Start Time","End Date","End Time","Category A","Category B"},
{
{"44016","0,75","44017","8,33333333333333E-02","Single main","Grind"}, {"44017","0,947916666666667","44018","0,510416666666667","Single Main","Grind"},
{"44017","0,989583333333333","44018","0,572916666666667","Single Main","Grind"}, {"44041","0,166666666666667","44041","0,25","Main","Grind"}, {"44041","0,916666666666667","44041","0,979166666666667","Main","Grind"}
}
),
Transform = Table.TransformColumns(Source,{{"Start Date", each Date.From(Number.From(_)), type date},{"End Date", each Date.From(Number.From(_)), type date},{"Start Time", each Time.From(Number.From(_)), type time},{"End Time", each Time.From(Number.From(_)), type time}}),
GroupBy = Table.Group(Transform, {"Start Date", "Category A", "Category B"}, {{"AllRows", each _, type table [Start Date=date, Start Time=time, End Date=date, End Time=time, Category A=text, Category B=text]}}),
TransformTable = Table.TransformColumns
(
GroupBy,
{
{
"AllRows",
(tableInt)=>
if Table.RowCount(tableInt)=2
then if ((tableInt[End Date]{0} & tableInt[End Time]{0}) > (tableInt[Start Date]{1} & tableInt[Start Time]{1})) and ((tableInt[Start Date]{0} & tableInt[Start Time]{0}) < (tableInt[End Date]{1} & tableInt[End Time]{1})) then
if (tableInt[End Date]{1} & tableInt[End Time]{1})> (tableInt[End Date]{0} & tableInt[End Time]{0})
then
Table.FromRecords( {tableInt{0} & Record.SelectFields(tableInt{1} , {"End Date", "End Time"})})
else
Table.FromRecords({tableInt{0}})
else tableInt
else tableInt
}
}
),
RemoveOtherColumns = Table.SelectColumns(TransformTable,{"AllRows"}),
ExpandedAllRows = Table.ExpandTableColumn(RemoveOtherColumns, "AllRows", {"Start Date", "Start Time", "End Date", "End Time", "Category A", "Category B"}, {"Start Date", "Start Time", "End Date", "End Time", "Category A", "Category B"})
in
ExpandedAllRows
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
@vvadlapatla - See if this attached PBIX helps, I just helped solve something similar for someone else but haven't had a chance to put this in the Quick Measures Gallery just yet. Attached below sig.
HI @Greg_Deckler ,
I cannot open the file as I dont have the latest version. Do you mind to copy the power query for me? upgrading is not that easy as all the team should be on the same version! hope you understand. Thanks
@vvadlapatla Ow crap, didn't check the forum again! It's a DAX solution 😞 Here it is anyway:
Overlap =
VAR __Start = MIN('Table'[Start])
VAR __End = MAX('Table'[End])
RETURN
IF(ISBLANK(MAX([Start])),BLANK(),
VAR __Table = GENERATESERIES(__Start,__End,1/24/60)
VAR __Table1 = ALL('Table')
VAR __Table2 = GENERATE(__Table,__Table1)
VAR __Table3 = ADDCOLUMNS(__Table2,"Include",IF([Value]>=[Start] && [Value] <= [End],1,0))
VAR __Table4 = GROUPBY(__Table3,[Value],"Minute",MAXX(CURRENTGROUP(),[Include]))
RETURN
SUMX(__Table4,[Minute])/60
)
Thanks @Greg_Deckler , I am reading this article in forum and got the code from that post but its a DAX one isnt it, I would like a power query one as I have to use other calulations later on that table. I initially tried min(start Date) and Max( end date) by using group by to find the total minutes but if there is no overlapping time as I have shown in my last two records, I am getting more minutes than expected.
Please let me know if this is not clear, happy to explain again.
@vvadlapatla See if the Power Query folks like @ImkeF and @edhans can assist.
Thanks Greg, Will have a quick look into it and will let you knwo if that works!
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.