Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi there.
The dataset consists of columns for ID, start time, and end time, with tens of thousands of rows.
I would like to aggregate other rows that overlap in time based on the start time of each row.
I created the query using the answer , but the performance is poor.
How do I create a better query?
#DataSet
#Example
ID | StartDateTime | EndDateTime | Overlap |
1 | 2021/11/25 00:00:00 | 2021/11/25 02:00:00 | ??? -> 1 |
2 | 2021/11/25 01:00:00 | 2021/11/25 02:00:00 | ??? -> 2 |
3 | 2021/11/25 03:00:00 | 2021/11/25 08:00:00 | ??? -> 1 |
4 | 2021/11/25 04:00:00 | 2021/11/25 07:00:00 | ??? -> 2 |
5 | 2021/11/25 05:00:00 | 2021/11/25 08:00:00 | ??? -> 3 |
… | |||
100000 |
#Query1
let
source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
convertType = Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}}),
addColumn = Table.AddColumn(convertType, "Temp", each convertType),
expandColumn = Table.ExpandTableColumn(addColumn, "Temp", {"StartDate", "EndDate"}, {"Temp.StartDate", "Temp.EndDate"}),
convertType1 = Table.TransformColumnTypes(expandColumn,{{"Temp.StartDate", type datetime}, {"Temp.EndDate", type datetime}}),
addColumn1 = Table.AddColumn(convertType1, "Compare", each ([Temp.StartDate] <= [StartDate]) and ([StartDate] <= [Temp.EndDate])),
filterRow = Table.SelectRows(addColumn1, each ([Compare] = true)),
groupBy = Table.Group(filterRow, {"Id"}, {{"Concurrent", each Table.RowCount(_), Int64.Type}})
in
groupBy
Query#2
let
source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
convertType = Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}}),
addColumn = Table.AddColumn(convertType, "Overlap", each Table.RowCount(Table.SelectRows(convertType, (x) => x[StartDate] <= [StartDate] and [StartDate] <= x[EndDate])))
in
addColumn
Best regards,
Solved! Go to Solution.
Hi @hirokichi
Groupby is slow, so did those 2 queries above achieve what you wanted? How about buffer the table in Query 2?
let
source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
convertType =Table.Buffer( Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}})),
addColumn = Table.AddColumn(convertType, "Overlap", each Table.RowCount(Table.SelectRows(convertType, (x) => x[StartDate] <= [StartDate] and [StartDate] <= x[EndDate])))
in
addColumn
try query 3
let
source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
convertType = Table.Buffer( Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}})),
Custom = List.Buffer( Table.AddColumn(convertType, "Custom", each {[StartDate], [EndDate]})[Custom]),
#"Added Custom1" = Table.AddColumn(convertType, "Overlap", (x)=> List.Count(List.Select( List.Transform(Custom, each _{0} <=x[StartDate] and _{1}>=x[StartDate]), each _=true)))
in
#"Added Custom1"
Hi @hirokichi
Groupby is slow, so did those 2 queries above achieve what you wanted? How about buffer the table in Query 2?
let
source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
convertType =Table.Buffer( Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}})),
addColumn = Table.AddColumn(convertType, "Overlap", each Table.RowCount(Table.SelectRows(convertType, (x) => x[StartDate] <= [StartDate] and [StartDate] <= x[EndDate])))
in
addColumn
try query 3
let
source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
convertType = Table.Buffer( Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}})),
Custom = List.Buffer( Table.AddColumn(convertType, "Custom", each {[StartDate], [EndDate]})[Custom]),
#"Added Custom1" = Table.AddColumn(convertType, "Overlap", (x)=> List.Count(List.Select( List.Transform(Custom, each _{0} <=x[StartDate] and _{1}>=x[StartDate]), each _=true)))
in
#"Added Custom1"
Thank you for your advice.
Queries are much faster!
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.